Diabetes Prediction Using Ensembling of Different Machine Learning Classifiers

Diabetes, also known as chronic illness, is a group of metabolic diseases due to a high level of sugar in the blood over a long period. The risk factor and severity of diabetes can be reduced significantly if the precise early prediction is possible. The robust and accurate prediction of diabetes is...

Full description

Bibliographic Details
Main Authors: Md. Kamrul Hasan, Md. Ashraful Alam, Dola Das, Eklas Hossain, Mahmudul Hasan
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9076634/
_version_ 1818428134587891712
author Md. Kamrul Hasan
Md. Ashraful Alam
Dola Das
Eklas Hossain
Mahmudul Hasan
author_facet Md. Kamrul Hasan
Md. Ashraful Alam
Dola Das
Eklas Hossain
Mahmudul Hasan
author_sort Md. Kamrul Hasan
collection DOAJ
description Diabetes, also known as chronic illness, is a group of metabolic diseases due to a high level of sugar in the blood over a long period. The risk factor and severity of diabetes can be reduced significantly if the precise early prediction is possible. The robust and accurate prediction of diabetes is highly challenging due to the limited number of labeled data and also the presence of outliers (or missing values) in the diabetes datasets. In this literature, we are proposing a robust framework for diabetes prediction where the outlier rejection, filling the missing values, data standardization, feature selection, K-fold cross-validation, and different Machine Learning (ML) classifiers (k-nearest Neighbour, Decision Trees, Random Forest, AdaBoost, Naive Bayes, and XGBoost) and Multilayer Perceptron (MLP) were employed. The weighted ensembling of different ML models is also proposed, in this literature, to improve the prediction of diabetes where the weights are estimated from the corresponding Area Under ROC Curve (AUC) of the ML model. AUC is chosen as the performance metric, which is then maximized during hyperparameter tuning using the grid search technique. All the experiments, in this literature, were conducted under the same experimental conditions using the Pima Indian Diabetes Dataset. From all the extensive experiments, our proposed ensembling classifier is the best performing classifier with the sensitivity, specificity, false omission rate, diagnostic odds ratio, and AUC as 0.789, 0.934, 0.092, 66.234, and 0.950 respectively which outperforms the state-of-the-art results by 2.00 % in AUC. Our proposed framework for the diabetes prediction outperforms the other methods discussed in the article. It can also provide better results on the same dataset which can lead to better performance in diabetes prediction. Our source code for diabetes prediction is made publicly available.
first_indexed 2024-12-14T14:56:48Z
format Article
id doaj.art-e13f8f8292de4d1b9e2ae7f99e1f1a4d
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-14T14:56:48Z
publishDate 2020-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-e13f8f8292de4d1b9e2ae7f99e1f1a4d2022-12-21T22:56:58ZengIEEEIEEE Access2169-35362020-01-018765167653110.1109/ACCESS.2020.29898579076634Diabetes Prediction Using Ensembling of Different Machine Learning ClassifiersMd. Kamrul Hasan0https://orcid.org/0000-0003-1292-4350Md. Ashraful Alam1Dola Das2Eklas Hossain3https://orcid.org/0000-0003-2332-8095Mahmudul Hasan4https://orcid.org/0000-0002-4386-0356Department of Electrical and Electronic Engineering, Khulna University of Engineering & Technology, Khulna, BangladeshDepartment of Electrical and Electronic Engineering, Khulna University of Engineering & Technology, Khulna, BangladeshDepartment of Electrical and Electronic Engineering, Khulna University of Engineering & Technology, Khulna, BangladeshDepartment of Electrical Engineering and Renewable Energy, Oregon Renewable Energy Center (OREC), Oregon Institute of Technology, Klamath Falls, OR, USADepartment of Electrical and Electronic Engineering, Khulna University of Engineering & Technology, Khulna, BangladeshDiabetes, also known as chronic illness, is a group of metabolic diseases due to a high level of sugar in the blood over a long period. The risk factor and severity of diabetes can be reduced significantly if the precise early prediction is possible. The robust and accurate prediction of diabetes is highly challenging due to the limited number of labeled data and also the presence of outliers (or missing values) in the diabetes datasets. In this literature, we are proposing a robust framework for diabetes prediction where the outlier rejection, filling the missing values, data standardization, feature selection, K-fold cross-validation, and different Machine Learning (ML) classifiers (k-nearest Neighbour, Decision Trees, Random Forest, AdaBoost, Naive Bayes, and XGBoost) and Multilayer Perceptron (MLP) were employed. The weighted ensembling of different ML models is also proposed, in this literature, to improve the prediction of diabetes where the weights are estimated from the corresponding Area Under ROC Curve (AUC) of the ML model. AUC is chosen as the performance metric, which is then maximized during hyperparameter tuning using the grid search technique. All the experiments, in this literature, were conducted under the same experimental conditions using the Pima Indian Diabetes Dataset. From all the extensive experiments, our proposed ensembling classifier is the best performing classifier with the sensitivity, specificity, false omission rate, diagnostic odds ratio, and AUC as 0.789, 0.934, 0.092, 66.234, and 0.950 respectively which outperforms the state-of-the-art results by 2.00 % in AUC. Our proposed framework for the diabetes prediction outperforms the other methods discussed in the article. It can also provide better results on the same dataset which can lead to better performance in diabetes prediction. Our source code for diabetes prediction is made publicly available.https://ieeexplore.ieee.org/document/9076634/Diabetes predictionensembling classifiermachine learningmultilayer perceptronmissing values and outliersPima Indian Diabetic dataset
spellingShingle Md. Kamrul Hasan
Md. Ashraful Alam
Dola Das
Eklas Hossain
Mahmudul Hasan
Diabetes Prediction Using Ensembling of Different Machine Learning Classifiers
IEEE Access
Diabetes prediction
ensembling classifier
machine learning
multilayer perceptron
missing values and outliers
Pima Indian Diabetic dataset
title Diabetes Prediction Using Ensembling of Different Machine Learning Classifiers
title_full Diabetes Prediction Using Ensembling of Different Machine Learning Classifiers
title_fullStr Diabetes Prediction Using Ensembling of Different Machine Learning Classifiers
title_full_unstemmed Diabetes Prediction Using Ensembling of Different Machine Learning Classifiers
title_short Diabetes Prediction Using Ensembling of Different Machine Learning Classifiers
title_sort diabetes prediction using ensembling of different machine learning classifiers
topic Diabetes prediction
ensembling classifier
machine learning
multilayer perceptron
missing values and outliers
Pima Indian Diabetic dataset
url https://ieeexplore.ieee.org/document/9076634/
work_keys_str_mv AT mdkamrulhasan diabetespredictionusingensemblingofdifferentmachinelearningclassifiers
AT mdashrafulalam diabetespredictionusingensemblingofdifferentmachinelearningclassifiers
AT doladas diabetespredictionusingensemblingofdifferentmachinelearningclassifiers
AT eklashossain diabetespredictionusingensemblingofdifferentmachinelearningclassifiers
AT mahmudulhasan diabetespredictionusingensemblingofdifferentmachinelearningclassifiers