Feature selection using law of total variance with fast correlation-based filter

The increased dimensionality of data poses a formidable obstacle to completing data mining tasks. Due to the extraneous features associated with high-dimensional data, processing and analysis took longer and were less precise. As a pre-processing phase in the analysis of data mining tasks, feature s...

Full description

Bibliographic Details
Main Authors: Nur Atiqah, Mustapa, Azlyna, Senawi, Liang, Chuanzun
Format: Conference or Workshop Item
Language:English
English
Published: Institute of Electrical and Electronics Engineers Inc. 2023
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/40376/1/Feature%20selection%20using%20law%20of%20total%20variance.pdf
http://umpir.ump.edu.my/id/eprint/40376/2/Feature%20selection%20using%20law%20of%20total%20variance%20with%20fast%20correlation-based%20filter_ABS.pdf
Description
Summary:The increased dimensionality of data poses a formidable obstacle to completing data mining tasks. Due to the extraneous features associated with high-dimensional data, processing and analysis took longer and were less precise. As a pre-processing phase in the analysis of data mining tasks, feature selection is effective at reducing dimensionality, removing irrelevant characteristics, increasing accuracy, and enhancing the readability of the results. This research proposes the law of total variance with fast correlation-based filter (LTVFCBF) as a new feature selection method. LTVFCBF chose the significant features by identifying relevant features and remove redundant features among the relevant ones. The analysis was conducted with ten datasets of varied dimensionality to evaluate the performance of the proposed LTVFCBF and validated using four classifiers: K-nearest neighbours, Naïve Bayes, support vector machine, and bagging. The LTVFCBF and LTV methods have been compared in terms of the number of selected features, classification accuracy, and execution time. In overall, the suggested LTVFCBF has the potential to minimize the dimensionality of data by selecting a lower number of significant features with better accuracy. However, it requires a slightly higher execution time compared to LTV. Aside from that, LTVFCBF can achieve comparable accuracy with faster execution time when less than half of the original features are maintained. The proposed method can produce a promising outcome and may be regarded as an effective filter approach for feature selection.