An Effective Methodology for Diabetes Prediction in the Case of Class Imbalance

Diabetes causes an increase in the level of blood sugar, which leads to damage to various parts of the human body. Diabetes data are used not only for providing a deeper understanding of the treatment mechanisms but also for predicting the probability that one might become sick. This paper proposes...

Descripción completa

Detalles Bibliográficos
Autores principales:	Borislava Toleva, Ivan Atanasov, Ivan Ivanov, Vincent Hooper
Formato:	Artículo
Lenguaje:	English
Publicado:	MDPI AG 2025-01-01
Colección:	Bioengineering
Materias:	class imbalance classification cross validation resample shuffle
Acceso en línea:	https://www.mdpi.com/2306-5354/12/1/35

_version_	1826858221663420416
author	Borislava Toleva Ivan Atanasov Ivan Ivanov Vincent Hooper
author_facet	Borislava Toleva Ivan Atanasov Ivan Ivanov Vincent Hooper
author_sort	Borislava Toleva
collection	DOAJ
description	Diabetes causes an increase in the level of blood sugar, which leads to damage to various parts of the human body. Diabetes data are used not only for providing a deeper understanding of the treatment mechanisms but also for predicting the probability that one might become sick. This paper proposes a novel methodology to perform classification in the case of heavy class imbalance, as observed in the PIMA diabetes dataset. The proposed methodology uses two novel steps, namely resampling and random shuffling prior to defining the classification model. The methodology is tested with two versions of cross validation that are appropriate in cases of class imbalance—k-fold cross validation and stratified k-fold cross validation. Our findings suggest that when having imbalanced data, shuffling the data randomly prior to a train/test split can help improve estimation metrics. Our methodology can outperform existing machine learning algorithms and complex deep learning models. Applying our proposed methodology is a simple and fast way to predict labels with class imbalance. It does not require additional techniques to balance classes. It does not involve preselecting important variables, which saves time and makes the model easy for analysis. This makes it an effective methodology for initial and further modeling of data with class imbalance. Moreover, our methodologies show how to increase the effectiveness of the machine learning models based on the standard approaches and make them more reliable.
first_indexed	2025-02-16T18:56:02Z
format	Article
id	doaj.art-b497a3f3b3d2457ebeaf490b86e6727a
institution	Directory Open Access Journal
issn	2306-5354
language	English
last_indexed	2025-02-16T18:56:02Z
publishDate	2025-01-01
publisher	MDPI AG
record_format	Article
series	Bioengineering
spelling	doaj.art-b497a3f3b3d2457ebeaf490b86e6727a2025-01-24T13:23:02ZengMDPI AGBioengineering2306-53542025-01-011213510.3390/bioengineering12010035An Effective Methodology for Diabetes Prediction in the Case of Class ImbalanceBorislava Toleva0Ivan Atanasov1Ivan Ivanov2Vincent Hooper3Faculty of Economics and Business Administration, Sofia University, St. Kl. Ohridski, 1113 Sofia, BulgariaFaculty of Economics and Business Administration, Sofia University, St. Kl. Ohridski, 1113 Sofia, BulgariaFaculty of Economics and Business Administration, Sofia University, St. Kl. Ohridski, 1113 Sofia, BulgariaSP Jain Global School of Management, Academic City, Dubai P.O. Box 502345, United Arab EmiratesDiabetes causes an increase in the level of blood sugar, which leads to damage to various parts of the human body. Diabetes data are used not only for providing a deeper understanding of the treatment mechanisms but also for predicting the probability that one might become sick. This paper proposes a novel methodology to perform classification in the case of heavy class imbalance, as observed in the PIMA diabetes dataset. The proposed methodology uses two novel steps, namely resampling and random shuffling prior to defining the classification model. The methodology is tested with two versions of cross validation that are appropriate in cases of class imbalance—k-fold cross validation and stratified k-fold cross validation. Our findings suggest that when having imbalanced data, shuffling the data randomly prior to a train/test split can help improve estimation metrics. Our methodology can outperform existing machine learning algorithms and complex deep learning models. Applying our proposed methodology is a simple and fast way to predict labels with class imbalance. It does not require additional techniques to balance classes. It does not involve preselecting important variables, which saves time and makes the model easy for analysis. This makes it an effective methodology for initial and further modeling of data with class imbalance. Moreover, our methodologies show how to increase the effectiveness of the machine learning models based on the standard approaches and make them more reliable.https://www.mdpi.com/2306-5354/12/1/35class imbalanceclassificationcross validationresampleshuffle
spellingShingle	Borislava Toleva Ivan Atanasov Ivan Ivanov Vincent Hooper An Effective Methodology for Diabetes Prediction in the Case of Class Imbalance Bioengineering class imbalance classification cross validation resample shuffle
title	An Effective Methodology for Diabetes Prediction in the Case of Class Imbalance
title_full	An Effective Methodology for Diabetes Prediction in the Case of Class Imbalance
title_fullStr	An Effective Methodology for Diabetes Prediction in the Case of Class Imbalance
title_full_unstemmed	An Effective Methodology for Diabetes Prediction in the Case of Class Imbalance
title_short	An Effective Methodology for Diabetes Prediction in the Case of Class Imbalance
title_sort	effective methodology for diabetes prediction in the case of class imbalance
topic	class imbalance classification cross validation resample shuffle
url	https://www.mdpi.com/2306-5354/12/1/35
work_keys_str_mv	AT borislavatoleva aneffectivemethodologyfordiabetespredictioninthecaseofclassimbalance AT ivanatanasov aneffectivemethodologyfordiabetespredictioninthecaseofclassimbalance AT ivanivanov aneffectivemethodologyfordiabetespredictioninthecaseofclassimbalance AT vincenthooper aneffectivemethodologyfordiabetespredictioninthecaseofclassimbalance AT borislavatoleva effectivemethodologyfordiabetespredictioninthecaseofclassimbalance AT ivanatanasov effectivemethodologyfordiabetespredictioninthecaseofclassimbalance AT ivanivanov effectivemethodologyfordiabetespredictioninthecaseofclassimbalance AT vincenthooper effectivemethodologyfordiabetespredictioninthecaseofclassimbalance

An Effective Methodology for Diabetes Prediction in the Case of Class Imbalance

Ejemplares similares