Ensemble feature selection using weighted concatenated voting for text classification
Following the increasing number of high dimensional data, selecting relevant features has always been better handled by filter feature selection techniques due to its improved generalization, faster training time, dimensionality reduction, less prone to overfitting, and improved model performance....
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nigerian Society of Physical Sciences
2024-02-01
|
Series: | Journal of Nigerian Society of Physical Sciences |
Subjects: | |
Online Access: | https://journal.nsps.org.ng/index.php/jnsps/article/view/1823 |
Summary: | Following the increasing number of high dimensional data, selecting relevant features has always been better handled by filter feature selection techniques due to its improved generalization, faster training time, dimensionality reduction, less prone to overfitting, and improved model performance. However, the most used feature selection methods are unstable; a feature selection method chooses different subsets of characteristics that produce different classification accuracy. Selecting an appropriate hybrid harnesses the local feature relevant to the discriminative power of filter methods for improved text classification, which is lacking in past literature. In this paper, we proposed a novel multi-univariate hybrid feature selection method (MUNIFES) for enhanced discriminative power between the features and the target class. The proposed method utilizes multi-iterative processes to select the best feature sets from each univariate feature selection method. MUNIFES has employed the ensemble of multi-filter discriminative strength of Chi-Square (Chi2), Analysis of Variance (ANOVA), and Infogain methods to select optimal feature subsets. To evaluate the success of the proposed method, several experiments were performed on the 20newsgroup dataset and its variant (17newsgroup) with 10 classifiers (including ensemble, classification and optimization algorithms, and Artificial Neural Network (ANN)), compared with the state-of-the-art feature selection methods. The MUNIFES results indicated a better accuracy classification performance.
|
---|---|
ISSN: | 2714-2817 2714-4704 |