A robust predictive diagnosis model for diabetes mellitus using Shapley-incorporated machine learning algorithms

With the rapid advancement and integration of Artificial Intelligence (AI) in medicine, the need for new developments and accuracy in Machine Learning (ML) models and algorithms has increased substantially. Meanwhile, the limited availability of medical data has hampered the rapid advancement of AI,...

Full description

Bibliographic Details
Main Authors: Chukwuebuka Joseph Ejiyi, Zhen Qin, Joan Amos, Makuachukwu Bennedith Ejiyi, Ann Nnani, Thomas Ugochukwu Ejiyi, Victor Kwaku Agbesi, Chidimma Diokpo, Chidinma Okpara
Format: Article
Language:English
Published: Elsevier 2023-11-01
Series:Healthcare Analytics
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2772442523000333
Description
Summary:With the rapid advancement and integration of Artificial Intelligence (AI) in medicine, the need for new developments and accuracy in Machine Learning (ML) models and algorithms has increased substantially. Meanwhile, the limited availability of medical data has hampered the rapid advancement of AI, and robust models that can leverage the available data are needed. In this paper, we propose robust frameworks for the predictive diagnosis of diabetes mellitus using the limited data generated among women aged 21 to 81. The proposed frameworks have data augmentation, analysis of the attributes, and missing data imputations in common as the preliminary steps. We used Shapley Additive Explanation (SHAP) to extract feature importance and ascertain the most important features for fitting Extra Tree (ET), Random Forest (RF), Adaboost, and Xgboost models. The SHAP shows that glucose is the particular feature that contributes most to the prediction of diabetes, while in combination with age and Body Mass Index (BMI), they have a much more impact. Additionally, BMI and diabetes pedigree function also rate high for the prediction of diabetes; so, if it is hard to manage blood glucose, the focus can be switched to the management of BMI and the diabetes pedigree function. Informed by SHAP, we use a new dataset coined from the original one to fit the ML algorithms used for the prediction of diabetes, for which Xgboost and Adaboost performed better than other models with an accuracy of 94.67% each and an F1 score of 95.27 and 95.95, respectively.
ISSN:2772-4425