Predicting Thalassemia Using Feature Selection Techniques: A Comparative Analysis

Thalassemia represents one of the most common genetic disorders worldwide, characterized by defects in hemoglobin synthesis. The affected individuals suffer from malfunctioning of one or more of the four globin genes, leading to chronic hemolytic anemia, an imbalance in the hemoglobin chain ratio, i...

Full description

Bibliographic Details
Main Authors: Muniba Saleem, Waqar Aslam, Muhammad Ikram Ullah Lali, Hafiz Tayyab Rauf, Emad Abouel Nasr
Format: Article
Language:English
Published: MDPI AG 2023-11-01
Series:Diagnostics
Subjects:
Online Access:https://www.mdpi.com/2075-4418/13/22/3441
_version_ 1797459619570778112
author Muniba Saleem
Waqar Aslam
Muhammad Ikram Ullah Lali
Hafiz Tayyab Rauf
Emad Abouel Nasr
author_facet Muniba Saleem
Waqar Aslam
Muhammad Ikram Ullah Lali
Hafiz Tayyab Rauf
Emad Abouel Nasr
author_sort Muniba Saleem
collection DOAJ
description Thalassemia represents one of the most common genetic disorders worldwide, characterized by defects in hemoglobin synthesis. The affected individuals suffer from malfunctioning of one or more of the four globin genes, leading to chronic hemolytic anemia, an imbalance in the hemoglobin chain ratio, iron overload, and ineffective erythropoiesis. Despite the challenges posed by this condition, recent years have witnessed significant advancements in diagnosis, therapy, and transfusion support, significantly improving the prognosis for thalassemia patients. This research empirically evaluates the efficacy of models constructed using classification methods and explores the effectiveness of relevant features that are derived using various machine-learning techniques. Five feature selection approaches, namely Chi-Square (χ2), Exploratory Factor Score (EFS), tree-based Recursive Feature Elimination (RFE), gradient-based RFE, and Linear Regression Coefficient, were employed to determine the optimal feature set. Nine classifiers, namely K-Nearest Neighbors (KNN), Decision Trees (DT), Gradient Boosting Classifier (GBC), Linear Regression (LR), AdaBoost, Extreme Gradient Boosting (XGB), Random Forest (RF), Light Gradient Boosting Machine (LGBM), and Support Vector Machine (SVM), were utilized to evaluate the performance. The χ2 method achieved accuracy, registering 91.56% precision, 91.04% recall, and 92.65% f-score when aligned with the LR classifier. Moreover, the results underscore that amalgamating over-sampling with Synthetic Minority Over-sampling Technique (SMOTE), RFE, and 10-fold cross-validation markedly elevates the detection accuracy for αT patients. Notably, the Gradient Boosting Classifier (GBC) achieves 93.46% accuracy, 93.89% recall, and 92.72% F1 score.
first_indexed 2024-03-09T16:53:59Z
format Article
id doaj.art-962dd384b5de48ecb7355cce978323f9
institution Directory Open Access Journal
issn 2075-4418
language English
last_indexed 2024-03-09T16:53:59Z
publishDate 2023-11-01
publisher MDPI AG
record_format Article
series Diagnostics
spelling doaj.art-962dd384b5de48ecb7355cce978323f92023-11-24T14:37:36ZengMDPI AGDiagnostics2075-44182023-11-011322344110.3390/diagnostics13223441Predicting Thalassemia Using Feature Selection Techniques: A Comparative AnalysisMuniba Saleem0Waqar Aslam1Muhammad Ikram Ullah Lali2Hafiz Tayyab Rauf3Emad Abouel Nasr4Department of Computer Science & Information Technology, The Government Sadiq College Women University Bahawalpur, Bahawalpur 63100, PakistanDepartment of Information Security, The Islamia University of Bahawalpur, Bahawalpur 63100, PakistanDepartment of Information Sciences, University of Education Lahore, Lahore 54770, PakistanCentre for Smart Systems, AI and Cybersecurity, Staffordshire University, Stoke-on-Trent ST4 2DE, UKIndustrial Engineering Department, College of Engineering, King Saud University, Riyadh 11421, Saudi ArabiaThalassemia represents one of the most common genetic disorders worldwide, characterized by defects in hemoglobin synthesis. The affected individuals suffer from malfunctioning of one or more of the four globin genes, leading to chronic hemolytic anemia, an imbalance in the hemoglobin chain ratio, iron overload, and ineffective erythropoiesis. Despite the challenges posed by this condition, recent years have witnessed significant advancements in diagnosis, therapy, and transfusion support, significantly improving the prognosis for thalassemia patients. This research empirically evaluates the efficacy of models constructed using classification methods and explores the effectiveness of relevant features that are derived using various machine-learning techniques. Five feature selection approaches, namely Chi-Square (χ2), Exploratory Factor Score (EFS), tree-based Recursive Feature Elimination (RFE), gradient-based RFE, and Linear Regression Coefficient, were employed to determine the optimal feature set. Nine classifiers, namely K-Nearest Neighbors (KNN), Decision Trees (DT), Gradient Boosting Classifier (GBC), Linear Regression (LR), AdaBoost, Extreme Gradient Boosting (XGB), Random Forest (RF), Light Gradient Boosting Machine (LGBM), and Support Vector Machine (SVM), were utilized to evaluate the performance. The χ2 method achieved accuracy, registering 91.56% precision, 91.04% recall, and 92.65% f-score when aligned with the LR classifier. Moreover, the results underscore that amalgamating over-sampling with Synthetic Minority Over-sampling Technique (SMOTE), RFE, and 10-fold cross-validation markedly elevates the detection accuracy for αT patients. Notably, the Gradient Boosting Classifier (GBC) achieves 93.46% accuracy, 93.89% recall, and 92.72% F1 score.https://www.mdpi.com/2075-4418/13/22/3441thalassemiaclassificationfeature selectionfilter-basedwrapper and embedded method
spellingShingle Muniba Saleem
Waqar Aslam
Muhammad Ikram Ullah Lali
Hafiz Tayyab Rauf
Emad Abouel Nasr
Predicting Thalassemia Using Feature Selection Techniques: A Comparative Analysis
Diagnostics
thalassemia
classification
feature selection
filter-based
wrapper and embedded method
title Predicting Thalassemia Using Feature Selection Techniques: A Comparative Analysis
title_full Predicting Thalassemia Using Feature Selection Techniques: A Comparative Analysis
title_fullStr Predicting Thalassemia Using Feature Selection Techniques: A Comparative Analysis
title_full_unstemmed Predicting Thalassemia Using Feature Selection Techniques: A Comparative Analysis
title_short Predicting Thalassemia Using Feature Selection Techniques: A Comparative Analysis
title_sort predicting thalassemia using feature selection techniques a comparative analysis
topic thalassemia
classification
feature selection
filter-based
wrapper and embedded method
url https://www.mdpi.com/2075-4418/13/22/3441
work_keys_str_mv AT munibasaleem predictingthalassemiausingfeatureselectiontechniquesacomparativeanalysis
AT waqaraslam predictingthalassemiausingfeatureselectiontechniquesacomparativeanalysis
AT muhammadikramullahlali predictingthalassemiausingfeatureselectiontechniquesacomparativeanalysis
AT hafiztayyabrauf predictingthalassemiausingfeatureselectiontechniquesacomparativeanalysis
AT emadabouelnasr predictingthalassemiausingfeatureselectiontechniquesacomparativeanalysis