Explainable machine learning models for Medicare fraud detection

Abstract As a means of building explainable machine learning models for Big Data, we apply a novel ensemble supervised feature selection technique. The technique is applied to publicly available insurance claims data from the United States public health insurance program, Medicare. We approach Medic...

Full description

Bibliographic Details
Main Authors:	John T. Hancock, Richard A. Bauder, Huanjing Wang, Taghi M. Khoshgoftaar
Format:	Article
Language:	English
Published:	SpringerOpen 2023-10-01
Series:	Journal of Big Data
Subjects:	Big Data Class imbalance Explainable machine learning models Ensemble supervised feature selection Medicare fraud detection
Online Access:	https://doi.org/10.1186/s40537-023-00821-5

_version_	1797559120366141440
author	John T. Hancock Richard A. Bauder Huanjing Wang Taghi M. Khoshgoftaar
author_facet	John T. Hancock Richard A. Bauder Huanjing Wang Taghi M. Khoshgoftaar
author_sort	John T. Hancock
collection	DOAJ
description	Abstract As a means of building explainable machine learning models for Big Data, we apply a novel ensemble supervised feature selection technique. The technique is applied to publicly available insurance claims data from the United States public health insurance program, Medicare. We approach Medicare insurance fraud detection as a supervised machine learning task of anomaly detection through the classification of highly imbalanced Big Data. Our objectives for feature selection are to increase efficiency in model training, and to develop more explainable machine learning models for fraud detection. Using two Big Data datasets derived from two different sources of insurance claims data, we demonstrate how our feature selection technique reduces the dimensionality of the datasets by approximately 87.5% without compromising performance. Moreover, the reduction in dimensionality results in machine learning models that are easier to explain, and less prone to overfitting. Therefore, our primary contribution of the exposition of our novel feature selection technique leads to a further contribution to the application domain of automated Medicare insurance fraud detection. We utilize our feature selection technique to provide an explanation of our fraud detection models in terms of the definitions of the selected features. The ensemble supervised feature selection technique we present is flexible in that any collection of machine learning algorithms that maintain a list of feature importance values may be used. Therefore, researchers may easily employ variations of the technique we present.
first_indexed	2024-03-10T17:40:55Z
format	Article
id	doaj.art-89c068cfa9654be2a0a8a7368f4bcc4f
institution	Directory Open Access Journal
issn	2196-1115
language	English
last_indexed	2024-03-10T17:40:55Z
publishDate	2023-10-01
publisher	SpringerOpen
record_format	Article
series	Journal of Big Data
spelling	doaj.art-89c068cfa9654be2a0a8a7368f4bcc4f2023-11-20T09:42:32ZengSpringerOpenJournal of Big Data2196-11152023-10-0110113110.1186/s40537-023-00821-5Explainable machine learning models for Medicare fraud detectionJohn T. Hancock0Richard A. Bauder1Huanjing Wang2Taghi M. Khoshgoftaar3College of Engineering and Computer Science, Florida Atlantic UniversityCollege of Engineering and Computer Science, Florida Atlantic UniversityOgden College of Science and Engineering, Western Kentucky UniversityCollege of Engineering and Computer Science, Florida Atlantic UniversityAbstract As a means of building explainable machine learning models for Big Data, we apply a novel ensemble supervised feature selection technique. The technique is applied to publicly available insurance claims data from the United States public health insurance program, Medicare. We approach Medicare insurance fraud detection as a supervised machine learning task of anomaly detection through the classification of highly imbalanced Big Data. Our objectives for feature selection are to increase efficiency in model training, and to develop more explainable machine learning models for fraud detection. Using two Big Data datasets derived from two different sources of insurance claims data, we demonstrate how our feature selection technique reduces the dimensionality of the datasets by approximately 87.5% without compromising performance. Moreover, the reduction in dimensionality results in machine learning models that are easier to explain, and less prone to overfitting. Therefore, our primary contribution of the exposition of our novel feature selection technique leads to a further contribution to the application domain of automated Medicare insurance fraud detection. We utilize our feature selection technique to provide an explanation of our fraud detection models in terms of the definitions of the selected features. The ensemble supervised feature selection technique we present is flexible in that any collection of machine learning algorithms that maintain a list of feature importance values may be used. Therefore, researchers may easily employ variations of the technique we present.https://doi.org/10.1186/s40537-023-00821-5Big DataClass imbalanceExplainable machine learning modelsEnsemble supervised feature selectionMedicare fraud detection
spellingShingle	John T. Hancock Richard A. Bauder Huanjing Wang Taghi M. Khoshgoftaar Explainable machine learning models for Medicare fraud detection Journal of Big Data Big Data Class imbalance Explainable machine learning models Ensemble supervised feature selection Medicare fraud detection
title	Explainable machine learning models for Medicare fraud detection
title_full	Explainable machine learning models for Medicare fraud detection
title_fullStr	Explainable machine learning models for Medicare fraud detection
title_full_unstemmed	Explainable machine learning models for Medicare fraud detection
title_short	Explainable machine learning models for Medicare fraud detection
title_sort	explainable machine learning models for medicare fraud detection
topic	Big Data Class imbalance Explainable machine learning models Ensemble supervised feature selection Medicare fraud detection
url	https://doi.org/10.1186/s40537-023-00821-5
work_keys_str_mv	AT johnthancock explainablemachinelearningmodelsformedicarefrauddetection AT richardabauder explainablemachinelearningmodelsformedicarefrauddetection AT huanjingwang explainablemachinelearningmodelsformedicarefrauddetection AT taghimkhoshgoftaar explainablemachinelearningmodelsformedicarefrauddetection

Explainable machine learning models for Medicare fraud detection

Similar Items