A robust predictive diagnosis model for diabetes mellitus using Shapley-incorporated machine learning algorithms
With the rapid advancement and integration of Artificial Intelligence (AI) in medicine, the need for new developments and accuracy in Machine Learning (ML) models and algorithms has increased substantially. Meanwhile, the limited availability of medical data has hampered the rapid advancement of AI,...
Main Authors: | , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2023-11-01
|
Series: | Healthcare Analytics |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2772442523000333 |
_version_ | 1797796050674647040 |
---|---|
author | Chukwuebuka Joseph Ejiyi Zhen Qin Joan Amos Makuachukwu Bennedith Ejiyi Ann Nnani Thomas Ugochukwu Ejiyi Victor Kwaku Agbesi Chidimma Diokpo Chidinma Okpara |
author_facet | Chukwuebuka Joseph Ejiyi Zhen Qin Joan Amos Makuachukwu Bennedith Ejiyi Ann Nnani Thomas Ugochukwu Ejiyi Victor Kwaku Agbesi Chidimma Diokpo Chidinma Okpara |
author_sort | Chukwuebuka Joseph Ejiyi |
collection | DOAJ |
description | With the rapid advancement and integration of Artificial Intelligence (AI) in medicine, the need for new developments and accuracy in Machine Learning (ML) models and algorithms has increased substantially. Meanwhile, the limited availability of medical data has hampered the rapid advancement of AI, and robust models that can leverage the available data are needed. In this paper, we propose robust frameworks for the predictive diagnosis of diabetes mellitus using the limited data generated among women aged 21 to 81. The proposed frameworks have data augmentation, analysis of the attributes, and missing data imputations in common as the preliminary steps. We used Shapley Additive Explanation (SHAP) to extract feature importance and ascertain the most important features for fitting Extra Tree (ET), Random Forest (RF), Adaboost, and Xgboost models. The SHAP shows that glucose is the particular feature that contributes most to the prediction of diabetes, while in combination with age and Body Mass Index (BMI), they have a much more impact. Additionally, BMI and diabetes pedigree function also rate high for the prediction of diabetes; so, if it is hard to manage blood glucose, the focus can be switched to the management of BMI and the diabetes pedigree function. Informed by SHAP, we use a new dataset coined from the original one to fit the ML algorithms used for the prediction of diabetes, for which Xgboost and Adaboost performed better than other models with an accuracy of 94.67% each and an F1 score of 95.27 and 95.95, respectively. |
first_indexed | 2024-03-13T03:27:17Z |
format | Article |
id | doaj.art-5d427cbc09ae44b783649b1bce613636 |
institution | Directory Open Access Journal |
issn | 2772-4425 |
language | English |
last_indexed | 2024-03-13T03:27:17Z |
publishDate | 2023-11-01 |
publisher | Elsevier |
record_format | Article |
series | Healthcare Analytics |
spelling | doaj.art-5d427cbc09ae44b783649b1bce6136362023-06-25T04:44:15ZengElsevierHealthcare Analytics2772-44252023-11-013100166A robust predictive diagnosis model for diabetes mellitus using Shapley-incorporated machine learning algorithmsChukwuebuka Joseph Ejiyi0Zhen Qin1Joan Amos2Makuachukwu Bennedith Ejiyi3Ann Nnani4Thomas Ugochukwu Ejiyi5Victor Kwaku Agbesi6Chidimma Diokpo7Chidinma Okpara8School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, ChinaSchool of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, China; Corresponding author.School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, ChinaPharmacy Department University of Nigeria Nsukka, Enugu, NigeriaCollege of Environmental Science and Engineering Hohai University, Nanjing, Jiangsu, ChinaDepartment of Pure and Industrial Chemistry University of Nigeria Nsukka, Enugu, NigeriaSchool of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, ChinaDepartment of Food Science and Technology, Federal University of Technology Owerri, Imo State, NigeriaDepartment of Biochemistry, Federal University of Technology Owerri, Imo State, NigeriaWith the rapid advancement and integration of Artificial Intelligence (AI) in medicine, the need for new developments and accuracy in Machine Learning (ML) models and algorithms has increased substantially. Meanwhile, the limited availability of medical data has hampered the rapid advancement of AI, and robust models that can leverage the available data are needed. In this paper, we propose robust frameworks for the predictive diagnosis of diabetes mellitus using the limited data generated among women aged 21 to 81. The proposed frameworks have data augmentation, analysis of the attributes, and missing data imputations in common as the preliminary steps. We used Shapley Additive Explanation (SHAP) to extract feature importance and ascertain the most important features for fitting Extra Tree (ET), Random Forest (RF), Adaboost, and Xgboost models. The SHAP shows that glucose is the particular feature that contributes most to the prediction of diabetes, while in combination with age and Body Mass Index (BMI), they have a much more impact. Additionally, BMI and diabetes pedigree function also rate high for the prediction of diabetes; so, if it is hard to manage blood glucose, the focus can be switched to the management of BMI and the diabetes pedigree function. Informed by SHAP, we use a new dataset coined from the original one to fit the ML algorithms used for the prediction of diabetes, for which Xgboost and Adaboost performed better than other models with an accuracy of 94.67% each and an F1 score of 95.27 and 95.95, respectively.http://www.sciencedirect.com/science/article/pii/S2772442523000333Machine learningShapley additive explanationAttributesDiabetes mellitusFeature engineering |
spellingShingle | Chukwuebuka Joseph Ejiyi Zhen Qin Joan Amos Makuachukwu Bennedith Ejiyi Ann Nnani Thomas Ugochukwu Ejiyi Victor Kwaku Agbesi Chidimma Diokpo Chidinma Okpara A robust predictive diagnosis model for diabetes mellitus using Shapley-incorporated machine learning algorithms Healthcare Analytics Machine learning Shapley additive explanation Attributes Diabetes mellitus Feature engineering |
title | A robust predictive diagnosis model for diabetes mellitus using Shapley-incorporated machine learning algorithms |
title_full | A robust predictive diagnosis model for diabetes mellitus using Shapley-incorporated machine learning algorithms |
title_fullStr | A robust predictive diagnosis model for diabetes mellitus using Shapley-incorporated machine learning algorithms |
title_full_unstemmed | A robust predictive diagnosis model for diabetes mellitus using Shapley-incorporated machine learning algorithms |
title_short | A robust predictive diagnosis model for diabetes mellitus using Shapley-incorporated machine learning algorithms |
title_sort | robust predictive diagnosis model for diabetes mellitus using shapley incorporated machine learning algorithms |
topic | Machine learning Shapley additive explanation Attributes Diabetes mellitus Feature engineering |
url | http://www.sciencedirect.com/science/article/pii/S2772442523000333 |
work_keys_str_mv | AT chukwuebukajosephejiyi arobustpredictivediagnosismodelfordiabetesmellitususingshapleyincorporatedmachinelearningalgorithms AT zhenqin arobustpredictivediagnosismodelfordiabetesmellitususingshapleyincorporatedmachinelearningalgorithms AT joanamos arobustpredictivediagnosismodelfordiabetesmellitususingshapleyincorporatedmachinelearningalgorithms AT makuachukwubennedithejiyi arobustpredictivediagnosismodelfordiabetesmellitususingshapleyincorporatedmachinelearningalgorithms AT annnnani arobustpredictivediagnosismodelfordiabetesmellitususingshapleyincorporatedmachinelearningalgorithms AT thomasugochukwuejiyi arobustpredictivediagnosismodelfordiabetesmellitususingshapleyincorporatedmachinelearningalgorithms AT victorkwakuagbesi arobustpredictivediagnosismodelfordiabetesmellitususingshapleyincorporatedmachinelearningalgorithms AT chidimmadiokpo arobustpredictivediagnosismodelfordiabetesmellitususingshapleyincorporatedmachinelearningalgorithms AT chidinmaokpara arobustpredictivediagnosismodelfordiabetesmellitususingshapleyincorporatedmachinelearningalgorithms AT chukwuebukajosephejiyi robustpredictivediagnosismodelfordiabetesmellitususingshapleyincorporatedmachinelearningalgorithms AT zhenqin robustpredictivediagnosismodelfordiabetesmellitususingshapleyincorporatedmachinelearningalgorithms AT joanamos robustpredictivediagnosismodelfordiabetesmellitususingshapleyincorporatedmachinelearningalgorithms AT makuachukwubennedithejiyi robustpredictivediagnosismodelfordiabetesmellitususingshapleyincorporatedmachinelearningalgorithms AT annnnani robustpredictivediagnosismodelfordiabetesmellitususingshapleyincorporatedmachinelearningalgorithms AT thomasugochukwuejiyi robustpredictivediagnosismodelfordiabetesmellitususingshapleyincorporatedmachinelearningalgorithms AT victorkwakuagbesi robustpredictivediagnosismodelfordiabetesmellitususingshapleyincorporatedmachinelearningalgorithms AT chidimmadiokpo robustpredictivediagnosismodelfordiabetesmellitususingshapleyincorporatedmachinelearningalgorithms AT chidinmaokpara robustpredictivediagnosismodelfordiabetesmellitususingshapleyincorporatedmachinelearningalgorithms |