A Hybrid Feature Selection Optimization Model for High Dimension Data Classification

Feature selection is an NP-hard combinatorial problem, in which the number of possible feature subsets increases exponentially with the number of features. In the case of large dimensionality, the goal of feature selection is to determine the smallest possible features considering the most informati...

Full description

Bibliographic Details
Main Authors: Mohammed Qaraad, Souad Amjad, Ibrahim I. M. Manhrawy, Hanaa Fathi, Bayoumi Ali Hassan, Passent El Kafrawy
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9374967/
_version_ 1818736440996003840
author Mohammed Qaraad
Souad Amjad
Ibrahim I. M. Manhrawy
Hanaa Fathi
Bayoumi Ali Hassan
Passent El Kafrawy
author_facet Mohammed Qaraad
Souad Amjad
Ibrahim I. M. Manhrawy
Hanaa Fathi
Bayoumi Ali Hassan
Passent El Kafrawy
author_sort Mohammed Qaraad
collection DOAJ
description Feature selection is an NP-hard combinatorial problem, in which the number of possible feature subsets increases exponentially with the number of features. In the case of large dimensionality, the goal of feature selection is to determine the smallest possible features considering the most informative subset. In this paper, we proposed a hybrid feature selection optimization model for Cancer Classification called, ENSVM. Our model is based on using the Elastic Net (EN) method that regulates and selects variables for gene selection of genomic microarray data. We applied three different optimization techniques namely Social Ski-Driver (SSD), Randomized SearchCV (RS) and Elastic NetCV (ENCV) for determining Elastic Net with traditional Support Vector Machines for classification. To evaluate the model, we compared the results of applying ENSVM to seven genomic microarray data with the SSD-SVM model and SVM with (RBF) kernel without any feature selection method. The results of the comparison revealed the effect of ENSVM in selecting the optimal feature subset that maximized the classification performance. Accordingly, minimizing the number of features is significant when analyzing high dimensional data for performance nevertheless accuracy. Moreover, the ENSVM model is superior compared with the SSD-SVM model.
first_indexed 2024-12-18T00:37:12Z
format Article
id doaj.art-42573fa839bd4e6294e3254723ec0adc
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-18T00:37:12Z
publishDate 2021-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-42573fa839bd4e6294e3254723ec0adc2022-12-21T21:26:58ZengIEEEIEEE Access2169-35362021-01-019428844289510.1109/ACCESS.2021.30653419374967A Hybrid Feature Selection Optimization Model for High Dimension Data ClassificationMohammed Qaraad0Souad Amjad1Ibrahim I. M. Manhrawy2https://orcid.org/0000-0001-8819-4556Hanaa Fathi3Bayoumi Ali Hassan4Passent El Kafrawy5https://orcid.org/0000-0002-8557-4286Department of Computer Science, Faculty of Science, Abdelmalek Essaadi University, Tetouan, MoroccoDepartment of Computer Science, Faculty of Science, Abdelmalek Essaadi University, Tetouan, MoroccoDepartment of Basic Sciences, Modern Academy for Engineering and Technology, New Maadi, EgyptMathematics and Computer Science Department, Faculty of Science, Menoufia University, Shebin El-Kom, EgyptDepartment of Operations Research decision support, Faculty of Computer Science and Information, Cairo University, Giza, EgyptMathematics and Computer Science Department, Faculty of Science, Menoufia University, Shebin El-Kom, EgyptFeature selection is an NP-hard combinatorial problem, in which the number of possible feature subsets increases exponentially with the number of features. In the case of large dimensionality, the goal of feature selection is to determine the smallest possible features considering the most informative subset. In this paper, we proposed a hybrid feature selection optimization model for Cancer Classification called, ENSVM. Our model is based on using the Elastic Net (EN) method that regulates and selects variables for gene selection of genomic microarray data. We applied three different optimization techniques namely Social Ski-Driver (SSD), Randomized SearchCV (RS) and Elastic NetCV (ENCV) for determining Elastic Net with traditional Support Vector Machines for classification. To evaluate the model, we compared the results of applying ENSVM to seven genomic microarray data with the SSD-SVM model and SVM with (RBF) kernel without any feature selection method. The results of the comparison revealed the effect of ENSVM in selecting the optimal feature subset that maximized the classification performance. Accordingly, minimizing the number of features is significant when analyzing high dimensional data for performance nevertheless accuracy. Moreover, the ENSVM model is superior compared with the SSD-SVM model.https://ieeexplore.ieee.org/document/9374967/Cancer classificationfeature selectiongenomic microarray dataparameter optimizationelastic net (EN)social ski-driver (SSD)
spellingShingle Mohammed Qaraad
Souad Amjad
Ibrahim I. M. Manhrawy
Hanaa Fathi
Bayoumi Ali Hassan
Passent El Kafrawy
A Hybrid Feature Selection Optimization Model for High Dimension Data Classification
IEEE Access
Cancer classification
feature selection
genomic microarray data
parameter optimization
elastic net (EN)
social ski-driver (SSD)
title A Hybrid Feature Selection Optimization Model for High Dimension Data Classification
title_full A Hybrid Feature Selection Optimization Model for High Dimension Data Classification
title_fullStr A Hybrid Feature Selection Optimization Model for High Dimension Data Classification
title_full_unstemmed A Hybrid Feature Selection Optimization Model for High Dimension Data Classification
title_short A Hybrid Feature Selection Optimization Model for High Dimension Data Classification
title_sort hybrid feature selection optimization model for high dimension data classification
topic Cancer classification
feature selection
genomic microarray data
parameter optimization
elastic net (EN)
social ski-driver (SSD)
url https://ieeexplore.ieee.org/document/9374967/
work_keys_str_mv AT mohammedqaraad ahybridfeatureselectionoptimizationmodelforhighdimensiondataclassification
AT souadamjad ahybridfeatureselectionoptimizationmodelforhighdimensiondataclassification
AT ibrahimimmanhrawy ahybridfeatureselectionoptimizationmodelforhighdimensiondataclassification
AT hanaafathi ahybridfeatureselectionoptimizationmodelforhighdimensiondataclassification
AT bayoumialihassan ahybridfeatureselectionoptimizationmodelforhighdimensiondataclassification
AT passentelkafrawy ahybridfeatureselectionoptimizationmodelforhighdimensiondataclassification
AT mohammedqaraad hybridfeatureselectionoptimizationmodelforhighdimensiondataclassification
AT souadamjad hybridfeatureselectionoptimizationmodelforhighdimensiondataclassification
AT ibrahimimmanhrawy hybridfeatureselectionoptimizationmodelforhighdimensiondataclassification
AT hanaafathi hybridfeatureselectionoptimizationmodelforhighdimensiondataclassification
AT bayoumialihassan hybridfeatureselectionoptimizationmodelforhighdimensiondataclassification
AT passentelkafrawy hybridfeatureselectionoptimizationmodelforhighdimensiondataclassification