An unsupervised cluster-based feature grouping model for early diabetes detection

Diabetes mellitus is often a hyperglycemic condition that poses a substantial threat to human health. Early diabetes detection decreases morbidity and mortality. Due to the scarcity of labeled data and the presence of oddities in diabetes datasets, it is exceedingly difficult to develop a trustworth...

Full description

Bibliographic Details
Main Authors: Md. Mehedi Hassan, Swarnali Mollick, Farhana Yasmin
Format: Article
Language:English
Published: Elsevier 2022-11-01
Series:Healthcare Analytics
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2772442522000521
Description
Summary:Diabetes mellitus is often a hyperglycemic condition that poses a substantial threat to human health. Early diabetes detection decreases morbidity and mortality. Due to the scarcity of labeled data and the presence of oddities in diabetes datasets, it is exceedingly difficult to develop a trustworthy and accurate diabetes prognosis. The dataset and groupings of the features using the elbow and silhouette methods have been clustered using K-means. Various machine learning approaches have also been applied to the cluster-based dataset to predict diabetes. We propose an unsupervised cluster-based feature grouping model for early diabetes identification using an open-source dataset containing the data of 520 diabetic patients. On the cluster-based dataset and the complete dataset, the maximum Accuracy (ACC) is 99.57% and 99.03%, respectively. The best Precision, Recall, minimum mean squared error (MSE), maximum mean squared error (MSE), and F1-Score of 1.000 are obtained from multi-layer perceptron (MLP), random forest (RF), and k-Nearest Neighbors (KNN), 0.984 from random forest (RF) and support vector machine (SVM), 0.010 from RF, 0.067 from KNN, and 99.20% from RF, respectively. A comparison table displays the anticipated outcomes and highlights the aspects of this research that are most likely to occur as intended. The preprocessed data and codes are available on the GitHub repository to https://github.com/mhashiq/Early-stage-diabetes-risk-prediction.
ISSN:2772-4425