A robust predictive diagnosis model for diabetes mellitus using Shapley-incorporated machine learning algorithms

With the rapid advancement and integration of Artificial Intelligence (AI) in medicine, the need for new developments and accuracy in Machine Learning (ML) models and algorithms has increased substantially. Meanwhile, the limited availability of medical data has hampered the rapid advancement of AI,...

Full description

Bibliographic Details
Main Authors: Chukwuebuka Joseph Ejiyi, Zhen Qin, Joan Amos, Makuachukwu Bennedith Ejiyi, Ann Nnani, Thomas Ugochukwu Ejiyi, Victor Kwaku Agbesi, Chidimma Diokpo, Chidinma Okpara
Format: Article
Language:English
Published: Elsevier 2023-11-01
Series:Healthcare Analytics
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2772442523000333
_version_ 1797796050674647040
author Chukwuebuka Joseph Ejiyi
Zhen Qin
Joan Amos
Makuachukwu Bennedith Ejiyi
Ann Nnani
Thomas Ugochukwu Ejiyi
Victor Kwaku Agbesi
Chidimma Diokpo
Chidinma Okpara
author_facet Chukwuebuka Joseph Ejiyi
Zhen Qin
Joan Amos
Makuachukwu Bennedith Ejiyi
Ann Nnani
Thomas Ugochukwu Ejiyi
Victor Kwaku Agbesi
Chidimma Diokpo
Chidinma Okpara
author_sort Chukwuebuka Joseph Ejiyi
collection DOAJ
description With the rapid advancement and integration of Artificial Intelligence (AI) in medicine, the need for new developments and accuracy in Machine Learning (ML) models and algorithms has increased substantially. Meanwhile, the limited availability of medical data has hampered the rapid advancement of AI, and robust models that can leverage the available data are needed. In this paper, we propose robust frameworks for the predictive diagnosis of diabetes mellitus using the limited data generated among women aged 21 to 81. The proposed frameworks have data augmentation, analysis of the attributes, and missing data imputations in common as the preliminary steps. We used Shapley Additive Explanation (SHAP) to extract feature importance and ascertain the most important features for fitting Extra Tree (ET), Random Forest (RF), Adaboost, and Xgboost models. The SHAP shows that glucose is the particular feature that contributes most to the prediction of diabetes, while in combination with age and Body Mass Index (BMI), they have a much more impact. Additionally, BMI and diabetes pedigree function also rate high for the prediction of diabetes; so, if it is hard to manage blood glucose, the focus can be switched to the management of BMI and the diabetes pedigree function. Informed by SHAP, we use a new dataset coined from the original one to fit the ML algorithms used for the prediction of diabetes, for which Xgboost and Adaboost performed better than other models with an accuracy of 94.67% each and an F1 score of 95.27 and 95.95, respectively.
first_indexed 2024-03-13T03:27:17Z
format Article
id doaj.art-5d427cbc09ae44b783649b1bce613636
institution Directory Open Access Journal
issn 2772-4425
language English
last_indexed 2024-03-13T03:27:17Z
publishDate 2023-11-01
publisher Elsevier
record_format Article
series Healthcare Analytics
spelling doaj.art-5d427cbc09ae44b783649b1bce6136362023-06-25T04:44:15ZengElsevierHealthcare Analytics2772-44252023-11-013100166A robust predictive diagnosis model for diabetes mellitus using Shapley-incorporated machine learning algorithmsChukwuebuka Joseph Ejiyi0Zhen Qin1Joan Amos2Makuachukwu Bennedith Ejiyi3Ann Nnani4Thomas Ugochukwu Ejiyi5Victor Kwaku Agbesi6Chidimma Diokpo7Chidinma Okpara8School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, ChinaSchool of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, China; Corresponding author.School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, ChinaPharmacy Department University of Nigeria Nsukka, Enugu, NigeriaCollege of Environmental Science and Engineering Hohai University, Nanjing, Jiangsu, ChinaDepartment of Pure and Industrial Chemistry University of Nigeria Nsukka, Enugu, NigeriaSchool of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, ChinaDepartment of Food Science and Technology, Federal University of Technology Owerri, Imo State, NigeriaDepartment of Biochemistry, Federal University of Technology Owerri, Imo State, NigeriaWith the rapid advancement and integration of Artificial Intelligence (AI) in medicine, the need for new developments and accuracy in Machine Learning (ML) models and algorithms has increased substantially. Meanwhile, the limited availability of medical data has hampered the rapid advancement of AI, and robust models that can leverage the available data are needed. In this paper, we propose robust frameworks for the predictive diagnosis of diabetes mellitus using the limited data generated among women aged 21 to 81. The proposed frameworks have data augmentation, analysis of the attributes, and missing data imputations in common as the preliminary steps. We used Shapley Additive Explanation (SHAP) to extract feature importance and ascertain the most important features for fitting Extra Tree (ET), Random Forest (RF), Adaboost, and Xgboost models. The SHAP shows that glucose is the particular feature that contributes most to the prediction of diabetes, while in combination with age and Body Mass Index (BMI), they have a much more impact. Additionally, BMI and diabetes pedigree function also rate high for the prediction of diabetes; so, if it is hard to manage blood glucose, the focus can be switched to the management of BMI and the diabetes pedigree function. Informed by SHAP, we use a new dataset coined from the original one to fit the ML algorithms used for the prediction of diabetes, for which Xgboost and Adaboost performed better than other models with an accuracy of 94.67% each and an F1 score of 95.27 and 95.95, respectively.http://www.sciencedirect.com/science/article/pii/S2772442523000333Machine learningShapley additive explanationAttributesDiabetes mellitusFeature engineering
spellingShingle Chukwuebuka Joseph Ejiyi
Zhen Qin
Joan Amos
Makuachukwu Bennedith Ejiyi
Ann Nnani
Thomas Ugochukwu Ejiyi
Victor Kwaku Agbesi
Chidimma Diokpo
Chidinma Okpara
A robust predictive diagnosis model for diabetes mellitus using Shapley-incorporated machine learning algorithms
Healthcare Analytics
Machine learning
Shapley additive explanation
Attributes
Diabetes mellitus
Feature engineering
title A robust predictive diagnosis model for diabetes mellitus using Shapley-incorporated machine learning algorithms
title_full A robust predictive diagnosis model for diabetes mellitus using Shapley-incorporated machine learning algorithms
title_fullStr A robust predictive diagnosis model for diabetes mellitus using Shapley-incorporated machine learning algorithms
title_full_unstemmed A robust predictive diagnosis model for diabetes mellitus using Shapley-incorporated machine learning algorithms
title_short A robust predictive diagnosis model for diabetes mellitus using Shapley-incorporated machine learning algorithms
title_sort robust predictive diagnosis model for diabetes mellitus using shapley incorporated machine learning algorithms
topic Machine learning
Shapley additive explanation
Attributes
Diabetes mellitus
Feature engineering
url http://www.sciencedirect.com/science/article/pii/S2772442523000333
work_keys_str_mv AT chukwuebukajosephejiyi arobustpredictivediagnosismodelfordiabetesmellitususingshapleyincorporatedmachinelearningalgorithms
AT zhenqin arobustpredictivediagnosismodelfordiabetesmellitususingshapleyincorporatedmachinelearningalgorithms
AT joanamos arobustpredictivediagnosismodelfordiabetesmellitususingshapleyincorporatedmachinelearningalgorithms
AT makuachukwubennedithejiyi arobustpredictivediagnosismodelfordiabetesmellitususingshapleyincorporatedmachinelearningalgorithms
AT annnnani arobustpredictivediagnosismodelfordiabetesmellitususingshapleyincorporatedmachinelearningalgorithms
AT thomasugochukwuejiyi arobustpredictivediagnosismodelfordiabetesmellitususingshapleyincorporatedmachinelearningalgorithms
AT victorkwakuagbesi arobustpredictivediagnosismodelfordiabetesmellitususingshapleyincorporatedmachinelearningalgorithms
AT chidimmadiokpo arobustpredictivediagnosismodelfordiabetesmellitususingshapleyincorporatedmachinelearningalgorithms
AT chidinmaokpara arobustpredictivediagnosismodelfordiabetesmellitususingshapleyincorporatedmachinelearningalgorithms
AT chukwuebukajosephejiyi robustpredictivediagnosismodelfordiabetesmellitususingshapleyincorporatedmachinelearningalgorithms
AT zhenqin robustpredictivediagnosismodelfordiabetesmellitususingshapleyincorporatedmachinelearningalgorithms
AT joanamos robustpredictivediagnosismodelfordiabetesmellitususingshapleyincorporatedmachinelearningalgorithms
AT makuachukwubennedithejiyi robustpredictivediagnosismodelfordiabetesmellitususingshapleyincorporatedmachinelearningalgorithms
AT annnnani robustpredictivediagnosismodelfordiabetesmellitususingshapleyincorporatedmachinelearningalgorithms
AT thomasugochukwuejiyi robustpredictivediagnosismodelfordiabetesmellitususingshapleyincorporatedmachinelearningalgorithms
AT victorkwakuagbesi robustpredictivediagnosismodelfordiabetesmellitususingshapleyincorporatedmachinelearningalgorithms
AT chidimmadiokpo robustpredictivediagnosismodelfordiabetesmellitususingshapleyincorporatedmachinelearningalgorithms
AT chidinmaokpara robustpredictivediagnosismodelfordiabetesmellitususingshapleyincorporatedmachinelearningalgorithms