Exploring Hyper-Parameters and Feature Selection for Predicting Non-Communicable Chronic Disease Using Stacking Classifier
Non-communicable disease, especially chronic disease, is the most common factor of complication of deteriorating physical health and the state of one’s mind. It is also a prominent cause of illness and mortality around the world. Primarily chronic disease is preventable at a particular st...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2023-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10196339/ |
_version_ | 1797746142757257216 |
---|---|
author | Pooja Yadav S. C. Sharma Rajesh Mahadeva Shashikant P. Patole |
author_facet | Pooja Yadav S. C. Sharma Rajesh Mahadeva Shashikant P. Patole |
author_sort | Pooja Yadav |
collection | DOAJ |
description | Non-communicable disease, especially chronic disease, is the most common factor of complication of deteriorating physical health and the state of one’s mind. It is also a prominent cause of illness and mortality around the world. Primarily chronic disease is preventable at a particular stage though its occurrence is critical. To make clinical decisions, these illness prediction models were created to assist clinicians and patients. A chronic disease prediction model takes into account many risk variables to determine an individual’s illness risk. Machine learning approaches have made it possible to predict chronic disease early by collecting Electronic Health Record (EHR) data. This paper focuses on the diabetes dataset extracted from Kaggle and two unseen real datasets. In this paper, we have implemented Synthetic Minority Over-Sampling Technique (SMOTE) algorithm to balance the dataset. We have also explored Boruta as the feature selection method. To tune hyper-parameters of different algorithms, we have proposed an improved technique by combining the Grid Search method with the Grey Wolf Optimization algorithm. The Grid Search method requires extensive searching, while the Grey Wolf Optimization algorithm is easily linked, rapid to seek, and extremely exact. Nine conventional classification techniques have been evaluated in this paper. This research concentrates on the Stacking Classifier to assess the performance of the prediction model that produces the best results. The Proposed Model gave the highest F1-Score 98.84% on PIMA dataset, 98% after validation on the Synthetic dataset, 97.3% on ADRC dataset, 96.20% on FHD dataset. To the best of our knowledge, no previous work has focused on such a sort of technique and these two datasets. The outcomes of the comparison experiment on the PIMA dataset reveals that the proposed strategy performs better. This study also provides the interpretation of the proposed model. It conducts an ethical assessment of what explainability means for the use of Machine Learning models in clinical practice. |
first_indexed | 2024-03-12T15:32:48Z |
format | Article |
id | doaj.art-c723168483e94bcf8781571aa29374f2 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-03-12T15:32:48Z |
publishDate | 2023-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-c723168483e94bcf8781571aa29374f22023-08-09T23:01:43ZengIEEEIEEE Access2169-35362023-01-0111800308005510.1109/ACCESS.2023.329933210196339Exploring Hyper-Parameters and Feature Selection for Predicting Non-Communicable Chronic Disease Using Stacking ClassifierPooja Yadav0https://orcid.org/0000-0002-8305-9512S. C. Sharma1https://orcid.org/0000-0001-8093-7319Rajesh Mahadeva2https://orcid.org/0000-0001-8952-7172Shashikant P. Patole3https://orcid.org/0000-0001-6669-6635Electronics and Computer Discipline, DPT, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, IndiaElectronics and Computer Discipline, DPT, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, IndiaDepartment of Physics, Khalifa University of Science and Technology, Abu Dhabi, United Arab EmiratesDepartment of Physics, Khalifa University of Science and Technology, Abu Dhabi, United Arab EmiratesNon-communicable disease, especially chronic disease, is the most common factor of complication of deteriorating physical health and the state of one’s mind. It is also a prominent cause of illness and mortality around the world. Primarily chronic disease is preventable at a particular stage though its occurrence is critical. To make clinical decisions, these illness prediction models were created to assist clinicians and patients. A chronic disease prediction model takes into account many risk variables to determine an individual’s illness risk. Machine learning approaches have made it possible to predict chronic disease early by collecting Electronic Health Record (EHR) data. This paper focuses on the diabetes dataset extracted from Kaggle and two unseen real datasets. In this paper, we have implemented Synthetic Minority Over-Sampling Technique (SMOTE) algorithm to balance the dataset. We have also explored Boruta as the feature selection method. To tune hyper-parameters of different algorithms, we have proposed an improved technique by combining the Grid Search method with the Grey Wolf Optimization algorithm. The Grid Search method requires extensive searching, while the Grey Wolf Optimization algorithm is easily linked, rapid to seek, and extremely exact. Nine conventional classification techniques have been evaluated in this paper. This research concentrates on the Stacking Classifier to assess the performance of the prediction model that produces the best results. The Proposed Model gave the highest F1-Score 98.84% on PIMA dataset, 98% after validation on the Synthetic dataset, 97.3% on ADRC dataset, 96.20% on FHD dataset. To the best of our knowledge, no previous work has focused on such a sort of technique and these two datasets. The outcomes of the comparison experiment on the PIMA dataset reveals that the proposed strategy performs better. This study also provides the interpretation of the proposed model. It conducts an ethical assessment of what explainability means for the use of Machine Learning models in clinical practice.https://ieeexplore.ieee.org/document/10196339/Chronic diseasefeature selectionhyperparameter tuningmachine learningnon-communicable diseasesprediction model |
spellingShingle | Pooja Yadav S. C. Sharma Rajesh Mahadeva Shashikant P. Patole Exploring Hyper-Parameters and Feature Selection for Predicting Non-Communicable Chronic Disease Using Stacking Classifier IEEE Access Chronic disease feature selection hyperparameter tuning machine learning non-communicable diseases prediction model |
title | Exploring Hyper-Parameters and Feature Selection for Predicting Non-Communicable Chronic Disease Using Stacking Classifier |
title_full | Exploring Hyper-Parameters and Feature Selection for Predicting Non-Communicable Chronic Disease Using Stacking Classifier |
title_fullStr | Exploring Hyper-Parameters and Feature Selection for Predicting Non-Communicable Chronic Disease Using Stacking Classifier |
title_full_unstemmed | Exploring Hyper-Parameters and Feature Selection for Predicting Non-Communicable Chronic Disease Using Stacking Classifier |
title_short | Exploring Hyper-Parameters and Feature Selection for Predicting Non-Communicable Chronic Disease Using Stacking Classifier |
title_sort | exploring hyper parameters and feature selection for predicting non communicable chronic disease using stacking classifier |
topic | Chronic disease feature selection hyperparameter tuning machine learning non-communicable diseases prediction model |
url | https://ieeexplore.ieee.org/document/10196339/ |
work_keys_str_mv | AT poojayadav exploringhyperparametersandfeatureselectionforpredictingnoncommunicablechronicdiseaseusingstackingclassifier AT scsharma exploringhyperparametersandfeatureselectionforpredictingnoncommunicablechronicdiseaseusingstackingclassifier AT rajeshmahadeva exploringhyperparametersandfeatureselectionforpredictingnoncommunicablechronicdiseaseusingstackingclassifier AT shashikantppatole exploringhyperparametersandfeatureselectionforpredictingnoncommunicablechronicdiseaseusingstackingclassifier |