An Effective Methodology for Diabetes Prediction in the Case of Class Imbalance
Diabetes causes an increase in the level of blood sugar, which leads to damage to various parts of the human body. Diabetes data are used not only for providing a deeper understanding of the treatment mechanisms but also for predicting the probability that one might become sick. This paper proposes...
Autores principales: | , , , |
---|---|
Formato: | Artículo |
Lenguaje: | English |
Publicado: |
MDPI AG
2025-01-01
|
Colección: | Bioengineering |
Materias: | |
Acceso en línea: | https://www.mdpi.com/2306-5354/12/1/35 |
_version_ | 1826858221663420416 |
---|---|
author | Borislava Toleva Ivan Atanasov Ivan Ivanov Vincent Hooper |
author_facet | Borislava Toleva Ivan Atanasov Ivan Ivanov Vincent Hooper |
author_sort | Borislava Toleva |
collection | DOAJ |
description | Diabetes causes an increase in the level of blood sugar, which leads to damage to various parts of the human body. Diabetes data are used not only for providing a deeper understanding of the treatment mechanisms but also for predicting the probability that one might become sick. This paper proposes a novel methodology to perform classification in the case of heavy class imbalance, as observed in the PIMA diabetes dataset. The proposed methodology uses two novel steps, namely resampling and random shuffling prior to defining the classification model. The methodology is tested with two versions of cross validation that are appropriate in cases of class imbalance—k-fold cross validation and stratified k-fold cross validation. Our findings suggest that when having imbalanced data, shuffling the data randomly prior to a train/test split can help improve estimation metrics. Our methodology can outperform existing machine learning algorithms and complex deep learning models. Applying our proposed methodology is a simple and fast way to predict labels with class imbalance. It does not require additional techniques to balance classes. It does not involve preselecting important variables, which saves time and makes the model easy for analysis. This makes it an effective methodology for initial and further modeling of data with class imbalance. Moreover, our methodologies show how to increase the effectiveness of the machine learning models based on the standard approaches and make them more reliable. |
first_indexed | 2025-02-16T18:56:02Z |
format | Article |
id | doaj.art-b497a3f3b3d2457ebeaf490b86e6727a |
institution | Directory Open Access Journal |
issn | 2306-5354 |
language | English |
last_indexed | 2025-02-16T18:56:02Z |
publishDate | 2025-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Bioengineering |
spelling | doaj.art-b497a3f3b3d2457ebeaf490b86e6727a2025-01-24T13:23:02ZengMDPI AGBioengineering2306-53542025-01-011213510.3390/bioengineering12010035An Effective Methodology for Diabetes Prediction in the Case of Class ImbalanceBorislava Toleva0Ivan Atanasov1Ivan Ivanov2Vincent Hooper3Faculty of Economics and Business Administration, Sofia University, St. Kl. Ohridski, 1113 Sofia, BulgariaFaculty of Economics and Business Administration, Sofia University, St. Kl. Ohridski, 1113 Sofia, BulgariaFaculty of Economics and Business Administration, Sofia University, St. Kl. Ohridski, 1113 Sofia, BulgariaSP Jain Global School of Management, Academic City, Dubai P.O. Box 502345, United Arab EmiratesDiabetes causes an increase in the level of blood sugar, which leads to damage to various parts of the human body. Diabetes data are used not only for providing a deeper understanding of the treatment mechanisms but also for predicting the probability that one might become sick. This paper proposes a novel methodology to perform classification in the case of heavy class imbalance, as observed in the PIMA diabetes dataset. The proposed methodology uses two novel steps, namely resampling and random shuffling prior to defining the classification model. The methodology is tested with two versions of cross validation that are appropriate in cases of class imbalance—k-fold cross validation and stratified k-fold cross validation. Our findings suggest that when having imbalanced data, shuffling the data randomly prior to a train/test split can help improve estimation metrics. Our methodology can outperform existing machine learning algorithms and complex deep learning models. Applying our proposed methodology is a simple and fast way to predict labels with class imbalance. It does not require additional techniques to balance classes. It does not involve preselecting important variables, which saves time and makes the model easy for analysis. This makes it an effective methodology for initial and further modeling of data with class imbalance. Moreover, our methodologies show how to increase the effectiveness of the machine learning models based on the standard approaches and make them more reliable.https://www.mdpi.com/2306-5354/12/1/35class imbalanceclassificationcross validationresampleshuffle |
spellingShingle | Borislava Toleva Ivan Atanasov Ivan Ivanov Vincent Hooper An Effective Methodology for Diabetes Prediction in the Case of Class Imbalance Bioengineering class imbalance classification cross validation resample shuffle |
title | An Effective Methodology for Diabetes Prediction in the Case of Class Imbalance |
title_full | An Effective Methodology for Diabetes Prediction in the Case of Class Imbalance |
title_fullStr | An Effective Methodology for Diabetes Prediction in the Case of Class Imbalance |
title_full_unstemmed | An Effective Methodology for Diabetes Prediction in the Case of Class Imbalance |
title_short | An Effective Methodology for Diabetes Prediction in the Case of Class Imbalance |
title_sort | effective methodology for diabetes prediction in the case of class imbalance |
topic | class imbalance classification cross validation resample shuffle |
url | https://www.mdpi.com/2306-5354/12/1/35 |
work_keys_str_mv | AT borislavatoleva aneffectivemethodologyfordiabetespredictioninthecaseofclassimbalance AT ivanatanasov aneffectivemethodologyfordiabetespredictioninthecaseofclassimbalance AT ivanivanov aneffectivemethodologyfordiabetespredictioninthecaseofclassimbalance AT vincenthooper aneffectivemethodologyfordiabetespredictioninthecaseofclassimbalance AT borislavatoleva effectivemethodologyfordiabetespredictioninthecaseofclassimbalance AT ivanatanasov effectivemethodologyfordiabetespredictioninthecaseofclassimbalance AT ivanivanov effectivemethodologyfordiabetespredictioninthecaseofclassimbalance AT vincenthooper effectivemethodologyfordiabetespredictioninthecaseofclassimbalance |