An Effective Methodology for Diabetes Prediction in the Case of Class Imbalance

Diabetes causes an increase in the level of blood sugar, which leads to damage to various parts of the human body. Diabetes data are used not only for providing a deeper understanding of the treatment mechanisms but also for predicting the probability that one might become sick. This paper proposes...

Descripción completa

Detalles Bibliográficos
Autores principales: Borislava Toleva, Ivan Atanasov, Ivan Ivanov, Vincent Hooper
Formato: Artículo
Lenguaje:English
Publicado: MDPI AG 2025-01-01
Colección:Bioengineering
Materias:
Acceso en línea:https://www.mdpi.com/2306-5354/12/1/35
_version_ 1826858221663420416
author Borislava Toleva
Ivan Atanasov
Ivan Ivanov
Vincent Hooper
author_facet Borislava Toleva
Ivan Atanasov
Ivan Ivanov
Vincent Hooper
author_sort Borislava Toleva
collection DOAJ
description Diabetes causes an increase in the level of blood sugar, which leads to damage to various parts of the human body. Diabetes data are used not only for providing a deeper understanding of the treatment mechanisms but also for predicting the probability that one might become sick. This paper proposes a novel methodology to perform classification in the case of heavy class imbalance, as observed in the PIMA diabetes dataset. The proposed methodology uses two novel steps, namely resampling and random shuffling prior to defining the classification model. The methodology is tested with two versions of cross validation that are appropriate in cases of class imbalance—k-fold cross validation and stratified k-fold cross validation. Our findings suggest that when having imbalanced data, shuffling the data randomly prior to a train/test split can help improve estimation metrics. Our methodology can outperform existing machine learning algorithms and complex deep learning models. Applying our proposed methodology is a simple and fast way to predict labels with class imbalance. It does not require additional techniques to balance classes. It does not involve preselecting important variables, which saves time and makes the model easy for analysis. This makes it an effective methodology for initial and further modeling of data with class imbalance. Moreover, our methodologies show how to increase the effectiveness of the machine learning models based on the standard approaches and make them more reliable.
first_indexed 2025-02-16T18:56:02Z
format Article
id doaj.art-b497a3f3b3d2457ebeaf490b86e6727a
institution Directory Open Access Journal
issn 2306-5354
language English
last_indexed 2025-02-16T18:56:02Z
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Bioengineering
spelling doaj.art-b497a3f3b3d2457ebeaf490b86e6727a2025-01-24T13:23:02ZengMDPI AGBioengineering2306-53542025-01-011213510.3390/bioengineering12010035An Effective Methodology for Diabetes Prediction in the Case of Class ImbalanceBorislava Toleva0Ivan Atanasov1Ivan Ivanov2Vincent Hooper3Faculty of Economics and Business Administration, Sofia University, St. Kl. Ohridski, 1113 Sofia, BulgariaFaculty of Economics and Business Administration, Sofia University, St. Kl. Ohridski, 1113 Sofia, BulgariaFaculty of Economics and Business Administration, Sofia University, St. Kl. Ohridski, 1113 Sofia, BulgariaSP Jain Global School of Management, Academic City, Dubai P.O. Box 502345, United Arab EmiratesDiabetes causes an increase in the level of blood sugar, which leads to damage to various parts of the human body. Diabetes data are used not only for providing a deeper understanding of the treatment mechanisms but also for predicting the probability that one might become sick. This paper proposes a novel methodology to perform classification in the case of heavy class imbalance, as observed in the PIMA diabetes dataset. The proposed methodology uses two novel steps, namely resampling and random shuffling prior to defining the classification model. The methodology is tested with two versions of cross validation that are appropriate in cases of class imbalance—k-fold cross validation and stratified k-fold cross validation. Our findings suggest that when having imbalanced data, shuffling the data randomly prior to a train/test split can help improve estimation metrics. Our methodology can outperform existing machine learning algorithms and complex deep learning models. Applying our proposed methodology is a simple and fast way to predict labels with class imbalance. It does not require additional techniques to balance classes. It does not involve preselecting important variables, which saves time and makes the model easy for analysis. This makes it an effective methodology for initial and further modeling of data with class imbalance. Moreover, our methodologies show how to increase the effectiveness of the machine learning models based on the standard approaches and make them more reliable.https://www.mdpi.com/2306-5354/12/1/35class imbalanceclassificationcross validationresampleshuffle
spellingShingle Borislava Toleva
Ivan Atanasov
Ivan Ivanov
Vincent Hooper
An Effective Methodology for Diabetes Prediction in the Case of Class Imbalance
Bioengineering
class imbalance
classification
cross validation
resample
shuffle
title An Effective Methodology for Diabetes Prediction in the Case of Class Imbalance
title_full An Effective Methodology for Diabetes Prediction in the Case of Class Imbalance
title_fullStr An Effective Methodology for Diabetes Prediction in the Case of Class Imbalance
title_full_unstemmed An Effective Methodology for Diabetes Prediction in the Case of Class Imbalance
title_short An Effective Methodology for Diabetes Prediction in the Case of Class Imbalance
title_sort effective methodology for diabetes prediction in the case of class imbalance
topic class imbalance
classification
cross validation
resample
shuffle
url https://www.mdpi.com/2306-5354/12/1/35
work_keys_str_mv AT borislavatoleva aneffectivemethodologyfordiabetespredictioninthecaseofclassimbalance
AT ivanatanasov aneffectivemethodologyfordiabetespredictioninthecaseofclassimbalance
AT ivanivanov aneffectivemethodologyfordiabetespredictioninthecaseofclassimbalance
AT vincenthooper aneffectivemethodologyfordiabetespredictioninthecaseofclassimbalance
AT borislavatoleva effectivemethodologyfordiabetespredictioninthecaseofclassimbalance
AT ivanatanasov effectivemethodologyfordiabetespredictioninthecaseofclassimbalance
AT ivanivanov effectivemethodologyfordiabetespredictioninthecaseofclassimbalance
AT vincenthooper effectivemethodologyfordiabetespredictioninthecaseofclassimbalance