Assessing feature selection method performance with class imbalance data
Identifying the most informative features is a crucial step in feature selection. This paper focuses primarily on wrapper feature selection methods designed to detect important features with F1-score as the target metric. As an initial step, most wrapper methods order features according to importanc...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2021-12-01
|
Series: | Machine Learning with Applications |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2666827021000852 |
_version_ | 1819096853613903872 |
---|---|
author | Surani Matharaarachchi Mike Domaratzki Saman Muthukumarana |
author_facet | Surani Matharaarachchi Mike Domaratzki Saman Muthukumarana |
author_sort | Surani Matharaarachchi |
collection | DOAJ |
description | Identifying the most informative features is a crucial step in feature selection. This paper focuses primarily on wrapper feature selection methods designed to detect important features with F1-score as the target metric. As an initial step, most wrapper methods order features according to importance. However, in most cases, the importance is defined according to the classification method used and varies with the characteristics of the data set. Using synthetically simulated data, we examine four existing feature ordering techniques to find the most desirable and the most effective ordering mechanism to identify informative features. Using the results, an improved method is suggested to extract the most informative feature subset from the data set. The method uses the sum of absolute values of the first k principal component loadings to order the features where k is a user-defined application-specific value. It also applies a sequential feature selection method to extract the best subset of features. We further compare the performance of the proposed feature selection method with results from the existing Recursive Feature Elimination (RFE) by simulating data for several practical scenarios with a different number of informative features and different imbalance rates. We also validate the method using a real-world application on several classification methods. The results based on the accuracy measures indicate that the proposed approach performs better than the existing feature selection methods. |
first_indexed | 2024-12-22T00:05:48Z |
format | Article |
id | doaj.art-2f0ebbe5ca004d1e8d8e1f75896d796c |
institution | Directory Open Access Journal |
issn | 2666-8270 |
language | English |
last_indexed | 2024-12-22T00:05:48Z |
publishDate | 2021-12-01 |
publisher | Elsevier |
record_format | Article |
series | Machine Learning with Applications |
spelling | doaj.art-2f0ebbe5ca004d1e8d8e1f75896d796c2022-12-21T18:45:34ZengElsevierMachine Learning with Applications2666-82702021-12-016100170Assessing feature selection method performance with class imbalance dataSurani Matharaarachchi0Mike Domaratzki1Saman Muthukumarana2Department of Statistics, University of Manitoba, Winnipeg, MB, R3T 2N2, Canada; Corresponding author.Department of Computer Science, University of Manitoba, Winnipeg, MB, R3T 2N2, CanadaDepartment of Statistics, University of Manitoba, Winnipeg, MB, R3T 2N2, CanadaIdentifying the most informative features is a crucial step in feature selection. This paper focuses primarily on wrapper feature selection methods designed to detect important features with F1-score as the target metric. As an initial step, most wrapper methods order features according to importance. However, in most cases, the importance is defined according to the classification method used and varies with the characteristics of the data set. Using synthetically simulated data, we examine four existing feature ordering techniques to find the most desirable and the most effective ordering mechanism to identify informative features. Using the results, an improved method is suggested to extract the most informative feature subset from the data set. The method uses the sum of absolute values of the first k principal component loadings to order the features where k is a user-defined application-specific value. It also applies a sequential feature selection method to extract the best subset of features. We further compare the performance of the proposed feature selection method with results from the existing Recursive Feature Elimination (RFE) by simulating data for several practical scenarios with a different number of informative features and different imbalance rates. We also validate the method using a real-world application on several classification methods. The results based on the accuracy measures indicate that the proposed approach performs better than the existing feature selection methods.http://www.sciencedirect.com/science/article/pii/S2666827021000852Feature selectionInformative featureRecursive feature eliminationPrincipal component loading |
spellingShingle | Surani Matharaarachchi Mike Domaratzki Saman Muthukumarana Assessing feature selection method performance with class imbalance data Machine Learning with Applications Feature selection Informative feature Recursive feature elimination Principal component loading |
title | Assessing feature selection method performance with class imbalance data |
title_full | Assessing feature selection method performance with class imbalance data |
title_fullStr | Assessing feature selection method performance with class imbalance data |
title_full_unstemmed | Assessing feature selection method performance with class imbalance data |
title_short | Assessing feature selection method performance with class imbalance data |
title_sort | assessing feature selection method performance with class imbalance data |
topic | Feature selection Informative feature Recursive feature elimination Principal component loading |
url | http://www.sciencedirect.com/science/article/pii/S2666827021000852 |
work_keys_str_mv | AT suranimatharaarachchi assessingfeatureselectionmethodperformancewithclassimbalancedata AT mikedomaratzki assessingfeatureselectionmethodperformancewithclassimbalancedata AT samanmuthukumarana assessingfeatureselectionmethodperformancewithclassimbalancedata |