Ensemble feature selection using weighted concatenated voting for text classification
Following the increasing number of high dimensional data, selecting relevant features has always been better handled by filter feature selection techniques due to its improved generalization, faster training time, dimensionality reduction, less prone to overfitting, and improved model performance....
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nigerian Society of Physical Sciences
2024-02-01
|
Series: | Journal of Nigerian Society of Physical Sciences |
Subjects: | |
Online Access: | https://journal.nsps.org.ng/index.php/jnsps/article/view/1823 |
_version_ | 1797302904875384832 |
---|---|
author | Oluwaseun IGE Keng Hoon Gan |
author_facet | Oluwaseun IGE Keng Hoon Gan |
author_sort | Oluwaseun IGE |
collection | DOAJ |
description |
Following the increasing number of high dimensional data, selecting relevant features has always been better handled by filter feature selection techniques due to its improved generalization, faster training time, dimensionality reduction, less prone to overfitting, and improved model performance. However, the most used feature selection methods are unstable; a feature selection method chooses different subsets of characteristics that produce different classification accuracy. Selecting an appropriate hybrid harnesses the local feature relevant to the discriminative power of filter methods for improved text classification, which is lacking in past literature. In this paper, we proposed a novel multi-univariate hybrid feature selection method (MUNIFES) for enhanced discriminative power between the features and the target class. The proposed method utilizes multi-iterative processes to select the best feature sets from each univariate feature selection method. MUNIFES has employed the ensemble of multi-filter discriminative strength of Chi-Square (Chi2), Analysis of Variance (ANOVA), and Infogain methods to select optimal feature subsets. To evaluate the success of the proposed method, several experiments were performed on the 20newsgroup dataset and its variant (17newsgroup) with 10 classifiers (including ensemble, classification and optimization algorithms, and Artificial Neural Network (ANN)), compared with the state-of-the-art feature selection methods. The MUNIFES results indicated a better accuracy classification performance.
|
first_indexed | 2024-03-07T23:44:41Z |
format | Article |
id | doaj.art-e2ba9a161f9847618a40187b1f4e5d41 |
institution | Directory Open Access Journal |
issn | 2714-2817 2714-4704 |
language | English |
last_indexed | 2024-03-07T23:44:41Z |
publishDate | 2024-02-01 |
publisher | Nigerian Society of Physical Sciences |
record_format | Article |
series | Journal of Nigerian Society of Physical Sciences |
spelling | doaj.art-e2ba9a161f9847618a40187b1f4e5d412024-02-19T16:32:26ZengNigerian Society of Physical SciencesJournal of Nigerian Society of Physical Sciences2714-28172714-47042024-02-016110.46481/jnsps.2024.1823Ensemble feature selection using weighted concatenated voting for text classificationOluwaseun IGE0Keng Hoon Gan1School of Computer Sciences, Universiti Sains Malaysia, 11800 Gelugor, Pulau Pinang, Malaysia | Universal Basic Education Commission, Wuse Zone 4, Abuja, 900284, Nigeria.School of Computer Sciences, Universiti Sains Malaysia, 11800 Gelugor, Pulau Pinang, Malaysia Following the increasing number of high dimensional data, selecting relevant features has always been better handled by filter feature selection techniques due to its improved generalization, faster training time, dimensionality reduction, less prone to overfitting, and improved model performance. However, the most used feature selection methods are unstable; a feature selection method chooses different subsets of characteristics that produce different classification accuracy. Selecting an appropriate hybrid harnesses the local feature relevant to the discriminative power of filter methods for improved text classification, which is lacking in past literature. In this paper, we proposed a novel multi-univariate hybrid feature selection method (MUNIFES) for enhanced discriminative power between the features and the target class. The proposed method utilizes multi-iterative processes to select the best feature sets from each univariate feature selection method. MUNIFES has employed the ensemble of multi-filter discriminative strength of Chi-Square (Chi2), Analysis of Variance (ANOVA), and Infogain methods to select optimal feature subsets. To evaluate the success of the proposed method, several experiments were performed on the 20newsgroup dataset and its variant (17newsgroup) with 10 classifiers (including ensemble, classification and optimization algorithms, and Artificial Neural Network (ANN)), compared with the state-of-the-art feature selection methods. The MUNIFES results indicated a better accuracy classification performance. https://journal.nsps.org.ng/index.php/jnsps/article/view/1823Feature SelectionText ClassificationDimensionality ReductionUnivariate Filter Methods |
spellingShingle | Oluwaseun IGE Keng Hoon Gan Ensemble feature selection using weighted concatenated voting for text classification Journal of Nigerian Society of Physical Sciences Feature Selection Text Classification Dimensionality Reduction Univariate Filter Methods |
title | Ensemble feature selection using weighted concatenated voting for text classification |
title_full | Ensemble feature selection using weighted concatenated voting for text classification |
title_fullStr | Ensemble feature selection using weighted concatenated voting for text classification |
title_full_unstemmed | Ensemble feature selection using weighted concatenated voting for text classification |
title_short | Ensemble feature selection using weighted concatenated voting for text classification |
title_sort | ensemble feature selection using weighted concatenated voting for text classification |
topic | Feature Selection Text Classification Dimensionality Reduction Univariate Filter Methods |
url | https://journal.nsps.org.ng/index.php/jnsps/article/view/1823 |
work_keys_str_mv | AT oluwaseunige ensemblefeatureselectionusingweightedconcatenatedvotingfortextclassification AT kenghoongan ensemblefeatureselectionusingweightedconcatenatedvotingfortextclassification |