Feature Engineering and Resampling Strategies for Fund Transfer Fraud With Limited Transaction Data and a Time-Inhomogeneous Modi Operandi

Detecting financial fraud to profile crimes and pinpoint system vulnerabilities is an essential issue in the financial industry. Because of interpretability requirements and the lack of mass transaction data due to privacy regulations, sophisticated handcrafted features have been adopted in much of...

Full description

Bibliographic Details
Main Authors:	Yu-Yen Hsin, Tian-Shyr Dai, Yen-Wu Ti, Ming-Chuan Huang, Ting-Hui Chiang, Liang-Chih Liu
Format:	Article
Language:	English
Published:	IEEE 2022-01-01
Series:	IEEE Access
Subjects:	Electronic fund transfer fraud detection feature engineering Kolmogorov-Smirnov test resampling feature importance ranking
Online Access:	https://ieeexplore.ieee.org/document/9858047/

_version_	1811339583104745472
author	Yu-Yen Hsin Tian-Shyr Dai Yen-Wu Ti Ming-Chuan Huang Ting-Hui Chiang Liang-Chih Liu
author_facet	Yu-Yen Hsin Tian-Shyr Dai Yen-Wu Ti Ming-Chuan Huang Ting-Hui Chiang Liang-Chih Liu
author_sort	Yu-Yen Hsin
collection	DOAJ
description	Detecting financial fraud to profile crimes and pinpoint system vulnerabilities is an essential issue in the financial industry. Because of interpretability requirements and the lack of mass transaction data due to privacy regulations, sophisticated handcrafted features have been adopted in much of the literature for fraud detection. In addition to established recency, frequency, monetary, and anomaly features, we propose behavior- and segmentation-type features based on statistical characteristics belonging solely to (non-)fraudulent accounts informed by financial expertise. Our proposed features are difficult for automatic feature generators to synthesize, and provide transparent cause-effect relationships and good prediction results. Features with time-inhomogeneous properties cause popular boosting classifiers such as XGBoost and LGBM to produce unstable detection results. We use the Kolmogorov–Smirnov test to detect and remove these features to improve XGBoost and LGBM detection performance and robustness. The resulting performance shown in our experiments is better than that of other classifiers, such as SVM and random forests. We examine the advantage of our technique by comparing it with several feature engineering works on fraud detection and automatic feature generation methods. On the other hand, we also find that generating training/testing sets with random sampling falsely eliminates such time inhomogeneity and results in misleading assessments of the robustness of machine learning models. These time-inhomogeneous phenomena also entail various modus operandi patterns, which influence the performance of different resampling methods for addressing data imbalance in fraud detection. Improper linear interpolation of SMOTE-related approaches leads to poor performance due to varying patterns of modi operandi. However, synthesizing fraudulent samples with simple oversampling and GANs mitigates this problem.
first_indexed	2024-04-13T18:28:59Z
format	Article
id	doaj.art-c48825e9994043bc83536d3197e95b0e
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-04-13T18:28:59Z
publishDate	2022-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-c48825e9994043bc83536d3197e95b0e2022-12-22T02:35:10ZengIEEEIEEE Access2169-35362022-01-0110861018611610.1109/ACCESS.2022.31994259858047Feature Engineering and Resampling Strategies for Fund Transfer Fraud With Limited Transaction Data and a Time-Inhomogeneous Modi OperandiYu-Yen Hsin0Tian-Shyr Dai1https://orcid.org/0000-0002-9226-3056Yen-Wu Ti2https://orcid.org/0000-0002-9834-0075Ming-Chuan Huang3Ting-Hui Chiang4Liang-Chih Liu5https://orcid.org/0000-0002-2594-0109Institute of Finance, National Yang Ming Chiao Tung University, Hsinchu, TaiwanDepartment of Information Management and Finance, National Yang Ming Chiao Tung University, Hsinchu, TaiwanCollege of Artificial Intelligence, Yango University, Fuzhou, ChinaInstitute of Computer Science and Engineering, National Yang Ming Chiao Tung University, Hsinchu, TaiwanDepartment of Information Engineering and Computer Science, Feng Chia University, Taichung, TaiwanDepartment of Information and Finance Management, National Taipei University of Technology, Taipei, TaiwanDetecting financial fraud to profile crimes and pinpoint system vulnerabilities is an essential issue in the financial industry. Because of interpretability requirements and the lack of mass transaction data due to privacy regulations, sophisticated handcrafted features have been adopted in much of the literature for fraud detection. In addition to established recency, frequency, monetary, and anomaly features, we propose behavior- and segmentation-type features based on statistical characteristics belonging solely to (non-)fraudulent accounts informed by financial expertise. Our proposed features are difficult for automatic feature generators to synthesize, and provide transparent cause-effect relationships and good prediction results. Features with time-inhomogeneous properties cause popular boosting classifiers such as XGBoost and LGBM to produce unstable detection results. We use the Kolmogorov–Smirnov test to detect and remove these features to improve XGBoost and LGBM detection performance and robustness. The resulting performance shown in our experiments is better than that of other classifiers, such as SVM and random forests. We examine the advantage of our technique by comparing it with several feature engineering works on fraud detection and automatic feature generation methods. On the other hand, we also find that generating training/testing sets with random sampling falsely eliminates such time inhomogeneity and results in misleading assessments of the robustness of machine learning models. These time-inhomogeneous phenomena also entail various modus operandi patterns, which influence the performance of different resampling methods for addressing data imbalance in fraud detection. Improper linear interpolation of SMOTE-related approaches leads to poor performance due to varying patterns of modi operandi. However, synthesizing fraudulent samples with simple oversampling and GANs mitigates this problem.https://ieeexplore.ieee.org/document/9858047/Electronic fund transfer fraud detectionfeature engineeringKolmogorov-Smirnov testresamplingfeature importance ranking
spellingShingle	Yu-Yen Hsin Tian-Shyr Dai Yen-Wu Ti Ming-Chuan Huang Ting-Hui Chiang Liang-Chih Liu Feature Engineering and Resampling Strategies for Fund Transfer Fraud With Limited Transaction Data and a Time-Inhomogeneous Modi Operandi IEEE Access Electronic fund transfer fraud detection feature engineering Kolmogorov-Smirnov test resampling feature importance ranking
title	Feature Engineering and Resampling Strategies for Fund Transfer Fraud With Limited Transaction Data and a Time-Inhomogeneous Modi Operandi
title_full	Feature Engineering and Resampling Strategies for Fund Transfer Fraud With Limited Transaction Data and a Time-Inhomogeneous Modi Operandi
title_fullStr	Feature Engineering and Resampling Strategies for Fund Transfer Fraud With Limited Transaction Data and a Time-Inhomogeneous Modi Operandi
title_full_unstemmed	Feature Engineering and Resampling Strategies for Fund Transfer Fraud With Limited Transaction Data and a Time-Inhomogeneous Modi Operandi
title_short	Feature Engineering and Resampling Strategies for Fund Transfer Fraud With Limited Transaction Data and a Time-Inhomogeneous Modi Operandi
title_sort	feature engineering and resampling strategies for fund transfer fraud with limited transaction data and a time inhomogeneous modi operandi
topic	Electronic fund transfer fraud detection feature engineering Kolmogorov-Smirnov test resampling feature importance ranking
url	https://ieeexplore.ieee.org/document/9858047/
work_keys_str_mv	AT yuyenhsin featureengineeringandresamplingstrategiesforfundtransferfraudwithlimitedtransactiondataandatimeinhomogeneousmodioperandi AT tianshyrdai featureengineeringandresamplingstrategiesforfundtransferfraudwithlimitedtransactiondataandatimeinhomogeneousmodioperandi AT yenwuti featureengineeringandresamplingstrategiesforfundtransferfraudwithlimitedtransactiondataandatimeinhomogeneousmodioperandi AT mingchuanhuang featureengineeringandresamplingstrategiesforfundtransferfraudwithlimitedtransactiondataandatimeinhomogeneousmodioperandi AT tinghuichiang featureengineeringandresamplingstrategiesforfundtransferfraudwithlimitedtransactiondataandatimeinhomogeneousmodioperandi AT liangchihliu featureengineeringandresamplingstrategiesforfundtransferfraudwithlimitedtransactiondataandatimeinhomogeneousmodioperandi

Feature Engineering and Resampling Strategies for Fund Transfer Fraud With Limited Transaction Data and a Time-Inhomogeneous Modi Operandi

Similar Items