Early Thyroid Risk Prediction by Data Mining and Ensemble Classifiers

Thyroid disease is among the most prevalent endocrinopathies worldwide. As the thyroid gland controls human metabolism, thyroid illness is a matter of concern for human health. To save time and reduce error rates, an automatic, reliable, and accurate thyroid identification machine-learning (ML) syst...

Full description

Bibliographic Details
Main Author: Mohammad H. Alshayeji
Format: Article
Language:English
Published: MDPI AG 2023-09-01
Series:Machine Learning and Knowledge Extraction
Subjects:
Online Access:https://www.mdpi.com/2504-4990/5/3/61
_version_ 1797579095555440640
author Mohammad H. Alshayeji
author_facet Mohammad H. Alshayeji
author_sort Mohammad H. Alshayeji
collection DOAJ
description Thyroid disease is among the most prevalent endocrinopathies worldwide. As the thyroid gland controls human metabolism, thyroid illness is a matter of concern for human health. To save time and reduce error rates, an automatic, reliable, and accurate thyroid identification machine-learning (ML) system is essential. The proposed model aims to address existing work limitations such as the lack of detailed feature analysis, visualization, improvement in prediction accuracy, and reliability. Here, a public thyroid illness dataset containing 29 clinical features from the University of California, Irvine ML repository was used. The clinical features helped us to build an ML model that can predict thyroid illness by analyzing early symptoms and replacing the manual analysis of these attributes. Feature analysis and visualization facilitate an understanding of the role of features in thyroid prediction tasks. In addition, the overfitting problem was eliminated by 5-fold cross-validation and data balancing using the synthetic minority oversampling technique (SMOTE). Ensemble learning ensures prediction model reliability owing to the involvement of multiple classifiers in the prediction decisions. The proposed model achieved 99.5% accuracy, 99.39% sensitivity, and 99.59% specificity with the boosting method which is applicable to real-time computer-aided diagnosis (CAD) systems to ease diagnosis and promote early treatment.
first_indexed 2024-03-10T22:32:05Z
format Article
id doaj.art-783299268792484badec0808fb72b85e
institution Directory Open Access Journal
issn 2504-4990
language English
last_indexed 2024-03-10T22:32:05Z
publishDate 2023-09-01
publisher MDPI AG
record_format Article
series Machine Learning and Knowledge Extraction
spelling doaj.art-783299268792484badec0808fb72b85e2023-11-19T11:41:52ZengMDPI AGMachine Learning and Knowledge Extraction2504-49902023-09-01531195121310.3390/make5030061Early Thyroid Risk Prediction by Data Mining and Ensemble ClassifiersMohammad H. Alshayeji0Computer Engineering Department, College of Engineering and Petroleum, Kuwait University, Safat, P.O. Box 5969, Kuwait City 13060, KuwaitThyroid disease is among the most prevalent endocrinopathies worldwide. As the thyroid gland controls human metabolism, thyroid illness is a matter of concern for human health. To save time and reduce error rates, an automatic, reliable, and accurate thyroid identification machine-learning (ML) system is essential. The proposed model aims to address existing work limitations such as the lack of detailed feature analysis, visualization, improvement in prediction accuracy, and reliability. Here, a public thyroid illness dataset containing 29 clinical features from the University of California, Irvine ML repository was used. The clinical features helped us to build an ML model that can predict thyroid illness by analyzing early symptoms and replacing the manual analysis of these attributes. Feature analysis and visualization facilitate an understanding of the role of features in thyroid prediction tasks. In addition, the overfitting problem was eliminated by 5-fold cross-validation and data balancing using the synthetic minority oversampling technique (SMOTE). Ensemble learning ensures prediction model reliability owing to the involvement of multiple classifiers in the prediction decisions. The proposed model achieved 99.5% accuracy, 99.39% sensitivity, and 99.59% specificity with the boosting method which is applicable to real-time computer-aided diagnosis (CAD) systems to ease diagnosis and promote early treatment.https://www.mdpi.com/2504-4990/5/3/61machine learningthyroiddata miningensemble modelfeature engineering
spellingShingle Mohammad H. Alshayeji
Early Thyroid Risk Prediction by Data Mining and Ensemble Classifiers
Machine Learning and Knowledge Extraction
machine learning
thyroid
data mining
ensemble model
feature engineering
title Early Thyroid Risk Prediction by Data Mining and Ensemble Classifiers
title_full Early Thyroid Risk Prediction by Data Mining and Ensemble Classifiers
title_fullStr Early Thyroid Risk Prediction by Data Mining and Ensemble Classifiers
title_full_unstemmed Early Thyroid Risk Prediction by Data Mining and Ensemble Classifiers
title_short Early Thyroid Risk Prediction by Data Mining and Ensemble Classifiers
title_sort early thyroid risk prediction by data mining and ensemble classifiers
topic machine learning
thyroid
data mining
ensemble model
feature engineering
url https://www.mdpi.com/2504-4990/5/3/61
work_keys_str_mv AT mohammadhalshayeji earlythyroidriskpredictionbydataminingandensembleclassifiers