Fraud Detection in Healthcare Insurance Claims Using Machine Learning

Healthcare fraud is intentionally submitting false claims or producing misinterpretation of facts to obtain entitlement payments. Thus, it wastes healthcare financial resources and increases healthcare costs. Subsequently, fraud poses a substantial financial challenge. Therefore, supervised machine...

Full description

Bibliographic Details
Main Authors:	Eman Nabrawi, Abdullah Alanazi
Format:	Article
Language:	English
Published:	MDPI AG 2023-09-01
Series:	Risks
Subjects:	fraud insurance claims artificial neural networks (ANN) logistic regression (LR) random forest (RF) Saudi Arabia
Online Access:	https://www.mdpi.com/2227-9091/11/9/160

_version_	1827723697653284864
author	Eman Nabrawi Abdullah Alanazi
author_facet	Eman Nabrawi Abdullah Alanazi
author_sort	Eman Nabrawi
collection	DOAJ
description	Healthcare fraud is intentionally submitting false claims or producing misinterpretation of facts to obtain entitlement payments. Thus, it wastes healthcare financial resources and increases healthcare costs. Subsequently, fraud poses a substantial financial challenge. Therefore, supervised machine and deep learning analytics such as random forest, logistic regression, and artificial neural networks are successfully used to detect healthcare insurance fraud. This study aims to develop a health model that automatically detects fraud from health insurance claims in Saudi Arabia. The model indicates the greatest contributing factor to fraud with optimal accuracy. The labeled imbalanced dataset used three supervised deep and machine learning methods. The dataset was obtained from three healthcare providers in Saudi Arabia. The applied models were random forest, logistic regression, and artificial neural networks. The SMOT technique was used to balance the dataset. Boruta object feature selection was applied to exclude insignificant features. Validation metrics were accuracy, precision, recall, specificity, F1 score, and area under the curve (AUC). Random forest classifiers indicated policy type, education, and age as the most significant features with an accuracy of 98.21%, 98.08% precision, 100% recall, an F1 score of 99.03%, specificity of 80%, and an AUC of 90.00%. Logistic regression resulted in an accuracy of 80.36%, 97.62% precision, 80.39% recall, an F1 score of 88.17%, specificity of 80%, and an AUC of 80.20%. ANN revealed an accuracy of 94.64%, 98.00% precision, 96.08% recall, an F1 score of 97.03%, a specificity of 80%, and an AUC of 88.04%. This predictive analytics study applied three successful models, each of which yielded acceptable accuracy and validation metrics; however, further research on a larger dataset is advised.
first_indexed	2024-03-10T22:04:22Z
format	Article
id	doaj.art-7dde5126de7f43549f563b001efd5180
institution	Directory Open Access Journal
issn	2227-9091
language	English
last_indexed	2024-03-10T22:04:22Z
publishDate	2023-09-01
publisher	MDPI AG
record_format	Article
series	Risks
spelling	doaj.art-7dde5126de7f43549f563b001efd51802023-11-19T12:51:18ZengMDPI AGRisks2227-90912023-09-0111916010.3390/risks11090160Fraud Detection in Healthcare Insurance Claims Using Machine LearningEman Nabrawi0Abdullah Alanazi1Health Informatics Department, King Saud Ibn Abdulaziz University for Health Sciences, P.O. Box 3660, Riyadh 11481, Saudi ArabiaHealth Informatics Department, King Saud Ibn Abdulaziz University for Health Sciences, P.O. Box 3660, Riyadh 11481, Saudi ArabiaHealthcare fraud is intentionally submitting false claims or producing misinterpretation of facts to obtain entitlement payments. Thus, it wastes healthcare financial resources and increases healthcare costs. Subsequently, fraud poses a substantial financial challenge. Therefore, supervised machine and deep learning analytics such as random forest, logistic regression, and artificial neural networks are successfully used to detect healthcare insurance fraud. This study aims to develop a health model that automatically detects fraud from health insurance claims in Saudi Arabia. The model indicates the greatest contributing factor to fraud with optimal accuracy. The labeled imbalanced dataset used three supervised deep and machine learning methods. The dataset was obtained from three healthcare providers in Saudi Arabia. The applied models were random forest, logistic regression, and artificial neural networks. The SMOT technique was used to balance the dataset. Boruta object feature selection was applied to exclude insignificant features. Validation metrics were accuracy, precision, recall, specificity, F1 score, and area under the curve (AUC). Random forest classifiers indicated policy type, education, and age as the most significant features with an accuracy of 98.21%, 98.08% precision, 100% recall, an F1 score of 99.03%, specificity of 80%, and an AUC of 90.00%. Logistic regression resulted in an accuracy of 80.36%, 97.62% precision, 80.39% recall, an F1 score of 88.17%, specificity of 80%, and an AUC of 80.20%. ANN revealed an accuracy of 94.64%, 98.00% precision, 96.08% recall, an F1 score of 97.03%, a specificity of 80%, and an AUC of 88.04%. This predictive analytics study applied three successful models, each of which yielded acceptable accuracy and validation metrics; however, further research on a larger dataset is advised.https://www.mdpi.com/2227-9091/11/9/160fraudinsurance claimsartificial neural networks (ANN)logistic regression (LR)random forest (RF)Saudi Arabia
spellingShingle	Eman Nabrawi Abdullah Alanazi Fraud Detection in Healthcare Insurance Claims Using Machine Learning Risks fraud insurance claims artificial neural networks (ANN) logistic regression (LR) random forest (RF) Saudi Arabia
title	Fraud Detection in Healthcare Insurance Claims Using Machine Learning
title_full	Fraud Detection in Healthcare Insurance Claims Using Machine Learning
title_fullStr	Fraud Detection in Healthcare Insurance Claims Using Machine Learning
title_full_unstemmed	Fraud Detection in Healthcare Insurance Claims Using Machine Learning
title_short	Fraud Detection in Healthcare Insurance Claims Using Machine Learning
title_sort	fraud detection in healthcare insurance claims using machine learning
topic	fraud insurance claims artificial neural networks (ANN) logistic regression (LR) random forest (RF) Saudi Arabia
url	https://www.mdpi.com/2227-9091/11/9/160
work_keys_str_mv	AT emannabrawi frauddetectioninhealthcareinsuranceclaimsusingmachinelearning AT abdullahalanazi frauddetectioninhealthcareinsuranceclaimsusingmachinelearning

Fraud Detection in Healthcare Insurance Claims Using Machine Learning

Similar Items