Stacking ensemble approach to diagnosing the disease of diabetes

Background: Diabetes is a very common disease today and has acquired a worrying focus in the field of public health globally, in fact, it is estimated that the number of people with diabetes worldwide has reached 415 million. Objective: Propose a method and 4 combined models based on Stacking ensemb...

Full description

Bibliographic Details
Main Authors:	Alfredo Daza, Carlos Fidel Ponce Sánchez, Gonzalo Apaza-Perez, Juan Pinto, Karoline Zavaleta Ramos
Format:	Article
Language:	English
Published:	Elsevier 2024-01-01
Series:	Informatics in Medicine Unlocked
Subjects:	Machine learning Prediction Diabetes Oversampling Hyperparameters Stacking
Online Access:	http://www.sciencedirect.com/science/article/pii/S2352914823002733

_version_	1797348624173105152
author	Alfredo Daza Carlos Fidel Ponce Sánchez Gonzalo Apaza-Perez Juan Pinto Karoline Zavaleta Ramos
author_facet	Alfredo Daza Carlos Fidel Ponce Sánchez Gonzalo Apaza-Perez Juan Pinto Karoline Zavaleta Ramos
author_sort	Alfredo Daza
collection	DOAJ
description	Background: Diabetes is a very common disease today and has acquired a worrying focus in the field of public health globally, in fact, it is estimated that the number of people with diabetes worldwide has reached 415 million. Objective: Propose a method and 4 combined models based on Stacking ensemble to diagnose Diabetes. In addition, a web interface was developed with the best model proposed in this study. Methods: The dataset collected from the Diabetes Dataset composed of 768 patient records was used. The data was then pre-processed using the Python programming language. To balance the data, it was divided into 4 values and an oversampling method was applied to distribute the data proportionally. Then, divisions were made on the balanced data using the cross-validation method for data training, and the models were calibrated. Regarding the development of base algorithms, 7 independent algorithms were used, and 4 combined algorithms based on Stacking were proposed, and finally obtain the evaluation of the model with their respective metrics. Results: Stacking 1A (Logistic regression) with Oversampling reached the best value of Accuracy = 91.5 %, Sensitivity = 91.6 %, F1-Score = 91.49 % and Precision = 91.5 %, while with respect to the metric ROC Curve, Stacking 1A (Logistic regression) with Oversampling, Stacking 2A (Random Forest) with oversampling, and Random Forest (Independent) reached the best percentage, this being 97 %. Conclusions: Implementing 4 stacking models using the oversampling method, helps to make an adequate diagnosis of diabetes. Therefore, by using the combined method, an improvement in diabetes prediction was observed, surpassing the performance of the independent algorithms used.
first_indexed	2024-03-08T12:08:21Z
format	Article
id	doaj.art-0fef577c55d1459ebe5fb68748f817f0
institution	Directory Open Access Journal
issn	2352-9148
language	English
last_indexed	2024-03-08T12:08:21Z
publishDate	2024-01-01
publisher	Elsevier
record_format	Article
series	Informatics in Medicine Unlocked
spelling	doaj.art-0fef577c55d1459ebe5fb68748f817f02024-01-23T04:15:46ZengElsevierInformatics in Medicine Unlocked2352-91482024-01-0144101427Stacking ensemble approach to diagnosing the disease of diabetesAlfredo Daza0Carlos Fidel Ponce Sánchez1Gonzalo Apaza-Perez2Juan Pinto3Karoline Zavaleta Ramos4Faculty of Engineering and Architecture, School of Systems Engineering, Universidad César Vallejo, Lima, Peru; Corresponding author.Faculty of Industrial and Systems Engineering, School of Industrial Engineering, Universidad Nacional de Ingeniería, Lima, PeruGraduate School, Professional School of Systems Engineering, Universidad Nacional del Altiplano, UNAP, Puno, PeruFaculty of Systems Engineering, Professional School of System Engineering, Universidad Andina Néstor Cáceres Velasquez, Puno, PeruFaculty of Business Sciences, School of Management, Universidad César Vallejo, Trujillo, PeruBackground: Diabetes is a very common disease today and has acquired a worrying focus in the field of public health globally, in fact, it is estimated that the number of people with diabetes worldwide has reached 415 million. Objective: Propose a method and 4 combined models based on Stacking ensemble to diagnose Diabetes. In addition, a web interface was developed with the best model proposed in this study. Methods: The dataset collected from the Diabetes Dataset composed of 768 patient records was used. The data was then pre-processed using the Python programming language. To balance the data, it was divided into 4 values and an oversampling method was applied to distribute the data proportionally. Then, divisions were made on the balanced data using the cross-validation method for data training, and the models were calibrated. Regarding the development of base algorithms, 7 independent algorithms were used, and 4 combined algorithms based on Stacking were proposed, and finally obtain the evaluation of the model with their respective metrics. Results: Stacking 1A (Logistic regression) with Oversampling reached the best value of Accuracy = 91.5 %, Sensitivity = 91.6 %, F1-Score = 91.49 % and Precision = 91.5 %, while with respect to the metric ROC Curve, Stacking 1A (Logistic regression) with Oversampling, Stacking 2A (Random Forest) with oversampling, and Random Forest (Independent) reached the best percentage, this being 97 %. Conclusions: Implementing 4 stacking models using the oversampling method, helps to make an adequate diagnosis of diabetes. Therefore, by using the combined method, an improvement in diabetes prediction was observed, surpassing the performance of the independent algorithms used.http://www.sciencedirect.com/science/article/pii/S2352914823002733Machine learningPredictionDiabetesOversamplingHyperparametersStacking
spellingShingle	Alfredo Daza Carlos Fidel Ponce Sánchez Gonzalo Apaza-Perez Juan Pinto Karoline Zavaleta Ramos Stacking ensemble approach to diagnosing the disease of diabetes Informatics in Medicine Unlocked Machine learning Prediction Diabetes Oversampling Hyperparameters Stacking
title	Stacking ensemble approach to diagnosing the disease of diabetes
title_full	Stacking ensemble approach to diagnosing the disease of diabetes
title_fullStr	Stacking ensemble approach to diagnosing the disease of diabetes
title_full_unstemmed	Stacking ensemble approach to diagnosing the disease of diabetes
title_short	Stacking ensemble approach to diagnosing the disease of diabetes
title_sort	stacking ensemble approach to diagnosing the disease of diabetes
topic	Machine learning Prediction Diabetes Oversampling Hyperparameters Stacking
url	http://www.sciencedirect.com/science/article/pii/S2352914823002733
work_keys_str_mv	AT alfredodaza stackingensembleapproachtodiagnosingthediseaseofdiabetes AT carlosfidelponcesanchez stackingensembleapproachtodiagnosingthediseaseofdiabetes AT gonzaloapazaperez stackingensembleapproachtodiagnosingthediseaseofdiabetes AT juanpinto stackingensembleapproachtodiagnosingthediseaseofdiabetes AT karolinezavaletaramos stackingensembleapproachtodiagnosingthediseaseofdiabetes

Stacking ensemble approach to diagnosing the disease of diabetes

Similar Items