Assessing feature selection method performance with class imbalance data

Identifying the most informative features is a crucial step in feature selection. This paper focuses primarily on wrapper feature selection methods designed to detect important features with F1-score as the target metric. As an initial step, most wrapper methods order features according to importanc...

Full description

Bibliographic Details
Main Authors:	Surani Matharaarachchi, Mike Domaratzki, Saman Muthukumarana
Format:	Article
Language:	English
Published:	Elsevier 2021-12-01
Series:	Machine Learning with Applications
Subjects:	Feature selection Informative feature Recursive feature elimination Principal component loading
Online Access:	http://www.sciencedirect.com/science/article/pii/S2666827021000852

_version_	1819096853613903872
author	Surani Matharaarachchi Mike Domaratzki Saman Muthukumarana
author_facet	Surani Matharaarachchi Mike Domaratzki Saman Muthukumarana
author_sort	Surani Matharaarachchi
collection	DOAJ
description	Identifying the most informative features is a crucial step in feature selection. This paper focuses primarily on wrapper feature selection methods designed to detect important features with F1-score as the target metric. As an initial step, most wrapper methods order features according to importance. However, in most cases, the importance is defined according to the classification method used and varies with the characteristics of the data set. Using synthetically simulated data, we examine four existing feature ordering techniques to find the most desirable and the most effective ordering mechanism to identify informative features. Using the results, an improved method is suggested to extract the most informative feature subset from the data set. The method uses the sum of absolute values of the first k principal component loadings to order the features where k is a user-defined application-specific value. It also applies a sequential feature selection method to extract the best subset of features. We further compare the performance of the proposed feature selection method with results from the existing Recursive Feature Elimination (RFE) by simulating data for several practical scenarios with a different number of informative features and different imbalance rates. We also validate the method using a real-world application on several classification methods. The results based on the accuracy measures indicate that the proposed approach performs better than the existing feature selection methods.
first_indexed	2024-12-22T00:05:48Z
format	Article
id	doaj.art-2f0ebbe5ca004d1e8d8e1f75896d796c
institution	Directory Open Access Journal
issn	2666-8270
language	English
last_indexed	2024-12-22T00:05:48Z
publishDate	2021-12-01
publisher	Elsevier
record_format	Article
series	Machine Learning with Applications
spelling	doaj.art-2f0ebbe5ca004d1e8d8e1f75896d796c2022-12-21T18:45:34ZengElsevierMachine Learning with Applications2666-82702021-12-016100170Assessing feature selection method performance with class imbalance dataSurani Matharaarachchi0Mike Domaratzki1Saman Muthukumarana2Department of Statistics, University of Manitoba, Winnipeg, MB, R3T 2N2, Canada; Corresponding author.Department of Computer Science, University of Manitoba, Winnipeg, MB, R3T 2N2, CanadaDepartment of Statistics, University of Manitoba, Winnipeg, MB, R3T 2N2, CanadaIdentifying the most informative features is a crucial step in feature selection. This paper focuses primarily on wrapper feature selection methods designed to detect important features with F1-score as the target metric. As an initial step, most wrapper methods order features according to importance. However, in most cases, the importance is defined according to the classification method used and varies with the characteristics of the data set. Using synthetically simulated data, we examine four existing feature ordering techniques to find the most desirable and the most effective ordering mechanism to identify informative features. Using the results, an improved method is suggested to extract the most informative feature subset from the data set. The method uses the sum of absolute values of the first k principal component loadings to order the features where k is a user-defined application-specific value. It also applies a sequential feature selection method to extract the best subset of features. We further compare the performance of the proposed feature selection method with results from the existing Recursive Feature Elimination (RFE) by simulating data for several practical scenarios with a different number of informative features and different imbalance rates. We also validate the method using a real-world application on several classification methods. The results based on the accuracy measures indicate that the proposed approach performs better than the existing feature selection methods.http://www.sciencedirect.com/science/article/pii/S2666827021000852Feature selectionInformative featureRecursive feature eliminationPrincipal component loading
spellingShingle	Surani Matharaarachchi Mike Domaratzki Saman Muthukumarana Assessing feature selection method performance with class imbalance data Machine Learning with Applications Feature selection Informative feature Recursive feature elimination Principal component loading
title	Assessing feature selection method performance with class imbalance data
title_full	Assessing feature selection method performance with class imbalance data
title_fullStr	Assessing feature selection method performance with class imbalance data
title_full_unstemmed	Assessing feature selection method performance with class imbalance data
title_short	Assessing feature selection method performance with class imbalance data
title_sort	assessing feature selection method performance with class imbalance data
topic	Feature selection Informative feature Recursive feature elimination Principal component loading
url	http://www.sciencedirect.com/science/article/pii/S2666827021000852
work_keys_str_mv	AT suranimatharaarachchi assessingfeatureselectionmethodperformancewithclassimbalancedata AT mikedomaratzki assessingfeatureselectionmethodperformancewithclassimbalancedata AT samanmuthukumarana assessingfeatureselectionmethodperformancewithclassimbalancedata

Assessing feature selection method performance with class imbalance data

Similar Items