Ensemble feature selection using weighted concatenated voting for text classification

Following the increasing number of high dimensional data, selecting relevant features has always been better handled by filter feature selection techniques due to its improved generalization, faster training time, dimensionality reduction, less prone to overfitting, and improved model performance....

Full description

Bibliographic Details
Main Authors: Oluwaseun IGE, Keng Hoon Gan
Format: Article
Language:English
Published: Nigerian Society of Physical Sciences 2024-02-01
Series:Journal of Nigerian Society of Physical Sciences
Subjects:
Online Access:https://journal.nsps.org.ng/index.php/jnsps/article/view/1823
_version_ 1797302904875384832
author Oluwaseun IGE
Keng Hoon Gan
author_facet Oluwaseun IGE
Keng Hoon Gan
author_sort Oluwaseun IGE
collection DOAJ
description Following the increasing number of high dimensional data, selecting relevant features has always been better handled by filter feature selection techniques due to its improved generalization, faster training time, dimensionality reduction, less prone to overfitting, and improved model performance. However, the most used feature selection methods are unstable; a feature selection method chooses different subsets of characteristics that produce different classification accuracy. Selecting an appropriate hybrid harnesses the local feature relevant to the discriminative power of filter methods for improved text classification, which is lacking in past literature. In this paper, we proposed a novel multi-univariate hybrid feature selection method (MUNIFES) for enhanced discriminative power between the features and the target class. The proposed method utilizes multi-iterative processes to select the best feature sets from each univariate feature selection method. MUNIFES has employed the ensemble of multi-filter discriminative strength of Chi-Square (Chi2), Analysis of Variance (ANOVA), and Infogain methods to select optimal feature subsets. To evaluate the success of the proposed method, several experiments were performed on the 20newsgroup dataset and its variant (17newsgroup) with 10 classifiers (including ensemble, classification and optimization algorithms, and Artificial Neural Network (ANN)), compared with the state-of-the-art feature selection methods. The MUNIFES results indicated a better accuracy classification performance.
first_indexed 2024-03-07T23:44:41Z
format Article
id doaj.art-e2ba9a161f9847618a40187b1f4e5d41
institution Directory Open Access Journal
issn 2714-2817
2714-4704
language English
last_indexed 2024-03-07T23:44:41Z
publishDate 2024-02-01
publisher Nigerian Society of Physical Sciences
record_format Article
series Journal of Nigerian Society of Physical Sciences
spelling doaj.art-e2ba9a161f9847618a40187b1f4e5d412024-02-19T16:32:26ZengNigerian Society of Physical SciencesJournal of Nigerian Society of Physical Sciences2714-28172714-47042024-02-016110.46481/jnsps.2024.1823Ensemble feature selection using weighted concatenated voting for text classificationOluwaseun IGE0Keng Hoon Gan1School of Computer Sciences, Universiti Sains Malaysia, 11800 Gelugor, Pulau Pinang, Malaysia | Universal Basic Education Commission, Wuse Zone 4, Abuja, 900284, Nigeria.School of Computer Sciences, Universiti Sains Malaysia, 11800 Gelugor, Pulau Pinang, Malaysia Following the increasing number of high dimensional data, selecting relevant features has always been better handled by filter feature selection techniques due to its improved generalization, faster training time, dimensionality reduction, less prone to overfitting, and improved model performance. However, the most used feature selection methods are unstable; a feature selection method chooses different subsets of characteristics that produce different classification accuracy. Selecting an appropriate hybrid harnesses the local feature relevant to the discriminative power of filter methods for improved text classification, which is lacking in past literature. In this paper, we proposed a novel multi-univariate hybrid feature selection method (MUNIFES) for enhanced discriminative power between the features and the target class. The proposed method utilizes multi-iterative processes to select the best feature sets from each univariate feature selection method. MUNIFES has employed the ensemble of multi-filter discriminative strength of Chi-Square (Chi2), Analysis of Variance (ANOVA), and Infogain methods to select optimal feature subsets. To evaluate the success of the proposed method, several experiments were performed on the 20newsgroup dataset and its variant (17newsgroup) with 10 classifiers (including ensemble, classification and optimization algorithms, and Artificial Neural Network (ANN)), compared with the state-of-the-art feature selection methods. The MUNIFES results indicated a better accuracy classification performance. https://journal.nsps.org.ng/index.php/jnsps/article/view/1823Feature SelectionText ClassificationDimensionality ReductionUnivariate Filter Methods
spellingShingle Oluwaseun IGE
Keng Hoon Gan
Ensemble feature selection using weighted concatenated voting for text classification
Journal of Nigerian Society of Physical Sciences
Feature Selection
Text Classification
Dimensionality Reduction
Univariate Filter Methods
title Ensemble feature selection using weighted concatenated voting for text classification
title_full Ensemble feature selection using weighted concatenated voting for text classification
title_fullStr Ensemble feature selection using weighted concatenated voting for text classification
title_full_unstemmed Ensemble feature selection using weighted concatenated voting for text classification
title_short Ensemble feature selection using weighted concatenated voting for text classification
title_sort ensemble feature selection using weighted concatenated voting for text classification
topic Feature Selection
Text Classification
Dimensionality Reduction
Univariate Filter Methods
url https://journal.nsps.org.ng/index.php/jnsps/article/view/1823
work_keys_str_mv AT oluwaseunige ensemblefeatureselectionusingweightedconcatenatedvotingfortextclassification
AT kenghoongan ensemblefeatureselectionusingweightedconcatenatedvotingfortextclassification