A Hybrid Feature Selection Optimization Model for High Dimension Data Classification
Feature selection is an NP-hard combinatorial problem, in which the number of possible feature subsets increases exponentially with the number of features. In the case of large dimensionality, the goal of feature selection is to determine the smallest possible features considering the most informati...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2021-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9374967/ |
_version_ | 1818736440996003840 |
---|---|
author | Mohammed Qaraad Souad Amjad Ibrahim I. M. Manhrawy Hanaa Fathi Bayoumi Ali Hassan Passent El Kafrawy |
author_facet | Mohammed Qaraad Souad Amjad Ibrahim I. M. Manhrawy Hanaa Fathi Bayoumi Ali Hassan Passent El Kafrawy |
author_sort | Mohammed Qaraad |
collection | DOAJ |
description | Feature selection is an NP-hard combinatorial problem, in which the number of possible feature subsets increases exponentially with the number of features. In the case of large dimensionality, the goal of feature selection is to determine the smallest possible features considering the most informative subset. In this paper, we proposed a hybrid feature selection optimization model for Cancer Classification called, ENSVM. Our model is based on using the Elastic Net (EN) method that regulates and selects variables for gene selection of genomic microarray data. We applied three different optimization techniques namely Social Ski-Driver (SSD), Randomized SearchCV (RS) and Elastic NetCV (ENCV) for determining Elastic Net with traditional Support Vector Machines for classification. To evaluate the model, we compared the results of applying ENSVM to seven genomic microarray data with the SSD-SVM model and SVM with (RBF) kernel without any feature selection method. The results of the comparison revealed the effect of ENSVM in selecting the optimal feature subset that maximized the classification performance. Accordingly, minimizing the number of features is significant when analyzing high dimensional data for performance nevertheless accuracy. Moreover, the ENSVM model is superior compared with the SSD-SVM model. |
first_indexed | 2024-12-18T00:37:12Z |
format | Article |
id | doaj.art-42573fa839bd4e6294e3254723ec0adc |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-18T00:37:12Z |
publishDate | 2021-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-42573fa839bd4e6294e3254723ec0adc2022-12-21T21:26:58ZengIEEEIEEE Access2169-35362021-01-019428844289510.1109/ACCESS.2021.30653419374967A Hybrid Feature Selection Optimization Model for High Dimension Data ClassificationMohammed Qaraad0Souad Amjad1Ibrahim I. M. Manhrawy2https://orcid.org/0000-0001-8819-4556Hanaa Fathi3Bayoumi Ali Hassan4Passent El Kafrawy5https://orcid.org/0000-0002-8557-4286Department of Computer Science, Faculty of Science, Abdelmalek Essaadi University, Tetouan, MoroccoDepartment of Computer Science, Faculty of Science, Abdelmalek Essaadi University, Tetouan, MoroccoDepartment of Basic Sciences, Modern Academy for Engineering and Technology, New Maadi, EgyptMathematics and Computer Science Department, Faculty of Science, Menoufia University, Shebin El-Kom, EgyptDepartment of Operations Research decision support, Faculty of Computer Science and Information, Cairo University, Giza, EgyptMathematics and Computer Science Department, Faculty of Science, Menoufia University, Shebin El-Kom, EgyptFeature selection is an NP-hard combinatorial problem, in which the number of possible feature subsets increases exponentially with the number of features. In the case of large dimensionality, the goal of feature selection is to determine the smallest possible features considering the most informative subset. In this paper, we proposed a hybrid feature selection optimization model for Cancer Classification called, ENSVM. Our model is based on using the Elastic Net (EN) method that regulates and selects variables for gene selection of genomic microarray data. We applied three different optimization techniques namely Social Ski-Driver (SSD), Randomized SearchCV (RS) and Elastic NetCV (ENCV) for determining Elastic Net with traditional Support Vector Machines for classification. To evaluate the model, we compared the results of applying ENSVM to seven genomic microarray data with the SSD-SVM model and SVM with (RBF) kernel without any feature selection method. The results of the comparison revealed the effect of ENSVM in selecting the optimal feature subset that maximized the classification performance. Accordingly, minimizing the number of features is significant when analyzing high dimensional data for performance nevertheless accuracy. Moreover, the ENSVM model is superior compared with the SSD-SVM model.https://ieeexplore.ieee.org/document/9374967/Cancer classificationfeature selectiongenomic microarray dataparameter optimizationelastic net (EN)social ski-driver (SSD) |
spellingShingle | Mohammed Qaraad Souad Amjad Ibrahim I. M. Manhrawy Hanaa Fathi Bayoumi Ali Hassan Passent El Kafrawy A Hybrid Feature Selection Optimization Model for High Dimension Data Classification IEEE Access Cancer classification feature selection genomic microarray data parameter optimization elastic net (EN) social ski-driver (SSD) |
title | A Hybrid Feature Selection Optimization Model for High Dimension Data Classification |
title_full | A Hybrid Feature Selection Optimization Model for High Dimension Data Classification |
title_fullStr | A Hybrid Feature Selection Optimization Model for High Dimension Data Classification |
title_full_unstemmed | A Hybrid Feature Selection Optimization Model for High Dimension Data Classification |
title_short | A Hybrid Feature Selection Optimization Model for High Dimension Data Classification |
title_sort | hybrid feature selection optimization model for high dimension data classification |
topic | Cancer classification feature selection genomic microarray data parameter optimization elastic net (EN) social ski-driver (SSD) |
url | https://ieeexplore.ieee.org/document/9374967/ |
work_keys_str_mv | AT mohammedqaraad ahybridfeatureselectionoptimizationmodelforhighdimensiondataclassification AT souadamjad ahybridfeatureselectionoptimizationmodelforhighdimensiondataclassification AT ibrahimimmanhrawy ahybridfeatureselectionoptimizationmodelforhighdimensiondataclassification AT hanaafathi ahybridfeatureselectionoptimizationmodelforhighdimensiondataclassification AT bayoumialihassan ahybridfeatureselectionoptimizationmodelforhighdimensiondataclassification AT passentelkafrawy ahybridfeatureselectionoptimizationmodelforhighdimensiondataclassification AT mohammedqaraad hybridfeatureselectionoptimizationmodelforhighdimensiondataclassification AT souadamjad hybridfeatureselectionoptimizationmodelforhighdimensiondataclassification AT ibrahimimmanhrawy hybridfeatureselectionoptimizationmodelforhighdimensiondataclassification AT hanaafathi hybridfeatureselectionoptimizationmodelforhighdimensiondataclassification AT bayoumialihassan hybridfeatureselectionoptimizationmodelforhighdimensiondataclassification AT passentelkafrawy hybridfeatureselectionoptimizationmodelforhighdimensiondataclassification |