Diabetes Prediction Using Ensembling of Different Machine Learning Classifiers
Diabetes, also known as chronic illness, is a group of metabolic diseases due to a high level of sugar in the blood over a long period. The risk factor and severity of diabetes can be reduced significantly if the precise early prediction is possible. The robust and accurate prediction of diabetes is...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9076634/ |
_version_ | 1818428134587891712 |
---|---|
author | Md. Kamrul Hasan Md. Ashraful Alam Dola Das Eklas Hossain Mahmudul Hasan |
author_facet | Md. Kamrul Hasan Md. Ashraful Alam Dola Das Eklas Hossain Mahmudul Hasan |
author_sort | Md. Kamrul Hasan |
collection | DOAJ |
description | Diabetes, also known as chronic illness, is a group of metabolic diseases due to a high level of sugar in the blood over a long period. The risk factor and severity of diabetes can be reduced significantly if the precise early prediction is possible. The robust and accurate prediction of diabetes is highly challenging due to the limited number of labeled data and also the presence of outliers (or missing values) in the diabetes datasets. In this literature, we are proposing a robust framework for diabetes prediction where the outlier rejection, filling the missing values, data standardization, feature selection, K-fold cross-validation, and different Machine Learning (ML) classifiers (k-nearest Neighbour, Decision Trees, Random Forest, AdaBoost, Naive Bayes, and XGBoost) and Multilayer Perceptron (MLP) were employed. The weighted ensembling of different ML models is also proposed, in this literature, to improve the prediction of diabetes where the weights are estimated from the corresponding Area Under ROC Curve (AUC) of the ML model. AUC is chosen as the performance metric, which is then maximized during hyperparameter tuning using the grid search technique. All the experiments, in this literature, were conducted under the same experimental conditions using the Pima Indian Diabetes Dataset. From all the extensive experiments, our proposed ensembling classifier is the best performing classifier with the sensitivity, specificity, false omission rate, diagnostic odds ratio, and AUC as 0.789, 0.934, 0.092, 66.234, and 0.950 respectively which outperforms the state-of-the-art results by 2.00 % in AUC. Our proposed framework for the diabetes prediction outperforms the other methods discussed in the article. It can also provide better results on the same dataset which can lead to better performance in diabetes prediction. Our source code for diabetes prediction is made publicly available. |
first_indexed | 2024-12-14T14:56:48Z |
format | Article |
id | doaj.art-e13f8f8292de4d1b9e2ae7f99e1f1a4d |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-14T14:56:48Z |
publishDate | 2020-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-e13f8f8292de4d1b9e2ae7f99e1f1a4d2022-12-21T22:56:58ZengIEEEIEEE Access2169-35362020-01-018765167653110.1109/ACCESS.2020.29898579076634Diabetes Prediction Using Ensembling of Different Machine Learning ClassifiersMd. Kamrul Hasan0https://orcid.org/0000-0003-1292-4350Md. Ashraful Alam1Dola Das2Eklas Hossain3https://orcid.org/0000-0003-2332-8095Mahmudul Hasan4https://orcid.org/0000-0002-4386-0356Department of Electrical and Electronic Engineering, Khulna University of Engineering & Technology, Khulna, BangladeshDepartment of Electrical and Electronic Engineering, Khulna University of Engineering & Technology, Khulna, BangladeshDepartment of Electrical and Electronic Engineering, Khulna University of Engineering & Technology, Khulna, BangladeshDepartment of Electrical Engineering and Renewable Energy, Oregon Renewable Energy Center (OREC), Oregon Institute of Technology, Klamath Falls, OR, USADepartment of Electrical and Electronic Engineering, Khulna University of Engineering & Technology, Khulna, BangladeshDiabetes, also known as chronic illness, is a group of metabolic diseases due to a high level of sugar in the blood over a long period. The risk factor and severity of diabetes can be reduced significantly if the precise early prediction is possible. The robust and accurate prediction of diabetes is highly challenging due to the limited number of labeled data and also the presence of outliers (or missing values) in the diabetes datasets. In this literature, we are proposing a robust framework for diabetes prediction where the outlier rejection, filling the missing values, data standardization, feature selection, K-fold cross-validation, and different Machine Learning (ML) classifiers (k-nearest Neighbour, Decision Trees, Random Forest, AdaBoost, Naive Bayes, and XGBoost) and Multilayer Perceptron (MLP) were employed. The weighted ensembling of different ML models is also proposed, in this literature, to improve the prediction of diabetes where the weights are estimated from the corresponding Area Under ROC Curve (AUC) of the ML model. AUC is chosen as the performance metric, which is then maximized during hyperparameter tuning using the grid search technique. All the experiments, in this literature, were conducted under the same experimental conditions using the Pima Indian Diabetes Dataset. From all the extensive experiments, our proposed ensembling classifier is the best performing classifier with the sensitivity, specificity, false omission rate, diagnostic odds ratio, and AUC as 0.789, 0.934, 0.092, 66.234, and 0.950 respectively which outperforms the state-of-the-art results by 2.00 % in AUC. Our proposed framework for the diabetes prediction outperforms the other methods discussed in the article. It can also provide better results on the same dataset which can lead to better performance in diabetes prediction. Our source code for diabetes prediction is made publicly available.https://ieeexplore.ieee.org/document/9076634/Diabetes predictionensembling classifiermachine learningmultilayer perceptronmissing values and outliersPima Indian Diabetic dataset |
spellingShingle | Md. Kamrul Hasan Md. Ashraful Alam Dola Das Eklas Hossain Mahmudul Hasan Diabetes Prediction Using Ensembling of Different Machine Learning Classifiers IEEE Access Diabetes prediction ensembling classifier machine learning multilayer perceptron missing values and outliers Pima Indian Diabetic dataset |
title | Diabetes Prediction Using Ensembling of Different Machine Learning Classifiers |
title_full | Diabetes Prediction Using Ensembling of Different Machine Learning Classifiers |
title_fullStr | Diabetes Prediction Using Ensembling of Different Machine Learning Classifiers |
title_full_unstemmed | Diabetes Prediction Using Ensembling of Different Machine Learning Classifiers |
title_short | Diabetes Prediction Using Ensembling of Different Machine Learning Classifiers |
title_sort | diabetes prediction using ensembling of different machine learning classifiers |
topic | Diabetes prediction ensembling classifier machine learning multilayer perceptron missing values and outliers Pima Indian Diabetic dataset |
url | https://ieeexplore.ieee.org/document/9076634/ |
work_keys_str_mv | AT mdkamrulhasan diabetespredictionusingensemblingofdifferentmachinelearningclassifiers AT mdashrafulalam diabetespredictionusingensemblingofdifferentmachinelearningclassifiers AT doladas diabetespredictionusingensemblingofdifferentmachinelearningclassifiers AT eklashossain diabetespredictionusingensemblingofdifferentmachinelearningclassifiers AT mahmudulhasan diabetespredictionusingensemblingofdifferentmachinelearningclassifiers |