Feature selection using law of total variance with fast correlation-based filter

The increased dimensionality of data poses a formidable obstacle to completing data mining tasks. Due to the extraneous features associated with high-dimensional data, processing and analysis took longer and were less precise. As a pre-processing phase in the analysis of data mining tasks, feature s...

Full description

Bibliographic Details
Main Authors: Nur Atiqah, Mustapa, Azlyna, Senawi, Liang, Chuanzun
Format: Conference or Workshop Item
Language:English
English
Published: Institute of Electrical and Electronics Engineers Inc. 2023
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/40376/1/Feature%20selection%20using%20law%20of%20total%20variance.pdf
http://umpir.ump.edu.my/id/eprint/40376/2/Feature%20selection%20using%20law%20of%20total%20variance%20with%20fast%20correlation-based%20filter_ABS.pdf
_version_ 1796996242876661760
author Nur Atiqah, Mustapa
Azlyna, Senawi
Liang, Chuanzun
author_facet Nur Atiqah, Mustapa
Azlyna, Senawi
Liang, Chuanzun
author_sort Nur Atiqah, Mustapa
collection UMP
description The increased dimensionality of data poses a formidable obstacle to completing data mining tasks. Due to the extraneous features associated with high-dimensional data, processing and analysis took longer and were less precise. As a pre-processing phase in the analysis of data mining tasks, feature selection is effective at reducing dimensionality, removing irrelevant characteristics, increasing accuracy, and enhancing the readability of the results. This research proposes the law of total variance with fast correlation-based filter (LTVFCBF) as a new feature selection method. LTVFCBF chose the significant features by identifying relevant features and remove redundant features among the relevant ones. The analysis was conducted with ten datasets of varied dimensionality to evaluate the performance of the proposed LTVFCBF and validated using four classifiers: K-nearest neighbours, Naïve Bayes, support vector machine, and bagging. The LTVFCBF and LTV methods have been compared in terms of the number of selected features, classification accuracy, and execution time. In overall, the suggested LTVFCBF has the potential to minimize the dimensionality of data by selecting a lower number of significant features with better accuracy. However, it requires a slightly higher execution time compared to LTV. Aside from that, LTVFCBF can achieve comparable accuracy with faster execution time when less than half of the original features are maintained. The proposed method can produce a promising outcome and may be regarded as an effective filter approach for feature selection.
first_indexed 2024-04-22T01:26:03Z
format Conference or Workshop Item
id UMPir40376
institution Universiti Malaysia Pahang
language English
English
last_indexed 2024-04-22T01:26:03Z
publishDate 2023
publisher Institute of Electrical and Electronics Engineers Inc.
record_format dspace
spelling UMPir403762024-04-16T04:18:34Z http://umpir.ump.edu.my/id/eprint/40376/ Feature selection using law of total variance with fast correlation-based filter Nur Atiqah, Mustapa Azlyna, Senawi Liang, Chuanzun Q Science (General) QA Mathematics The increased dimensionality of data poses a formidable obstacle to completing data mining tasks. Due to the extraneous features associated with high-dimensional data, processing and analysis took longer and were less precise. As a pre-processing phase in the analysis of data mining tasks, feature selection is effective at reducing dimensionality, removing irrelevant characteristics, increasing accuracy, and enhancing the readability of the results. This research proposes the law of total variance with fast correlation-based filter (LTVFCBF) as a new feature selection method. LTVFCBF chose the significant features by identifying relevant features and remove redundant features among the relevant ones. The analysis was conducted with ten datasets of varied dimensionality to evaluate the performance of the proposed LTVFCBF and validated using four classifiers: K-nearest neighbours, Naïve Bayes, support vector machine, and bagging. The LTVFCBF and LTV methods have been compared in terms of the number of selected features, classification accuracy, and execution time. In overall, the suggested LTVFCBF has the potential to minimize the dimensionality of data by selecting a lower number of significant features with better accuracy. However, it requires a slightly higher execution time compared to LTV. Aside from that, LTVFCBF can achieve comparable accuracy with faster execution time when less than half of the original features are maintained. The proposed method can produce a promising outcome and may be regarded as an effective filter approach for feature selection. Institute of Electrical and Electronics Engineers Inc. 2023 Conference or Workshop Item PeerReviewed pdf en http://umpir.ump.edu.my/id/eprint/40376/1/Feature%20selection%20using%20law%20of%20total%20variance.pdf pdf en http://umpir.ump.edu.my/id/eprint/40376/2/Feature%20selection%20using%20law%20of%20total%20variance%20with%20fast%20correlation-based%20filter_ABS.pdf Nur Atiqah, Mustapa and Azlyna, Senawi and Liang, Chuanzun (2023) Feature selection using law of total variance with fast correlation-based filter. In: 8th International Conference on Software Engineering and Computer Systems, ICSECS 2023 , 25-27 August 2023 , Penang. pp. 35-40. (192961). ISBN 979-835031093-1 https://doi.org/10.1109/ICSECS58457.2023.10256367
spellingShingle Q Science (General)
QA Mathematics
Nur Atiqah, Mustapa
Azlyna, Senawi
Liang, Chuanzun
Feature selection using law of total variance with fast correlation-based filter
title Feature selection using law of total variance with fast correlation-based filter
title_full Feature selection using law of total variance with fast correlation-based filter
title_fullStr Feature selection using law of total variance with fast correlation-based filter
title_full_unstemmed Feature selection using law of total variance with fast correlation-based filter
title_short Feature selection using law of total variance with fast correlation-based filter
title_sort feature selection using law of total variance with fast correlation based filter
topic Q Science (General)
QA Mathematics
url http://umpir.ump.edu.my/id/eprint/40376/1/Feature%20selection%20using%20law%20of%20total%20variance.pdf
http://umpir.ump.edu.my/id/eprint/40376/2/Feature%20selection%20using%20law%20of%20total%20variance%20with%20fast%20correlation-based%20filter_ABS.pdf
work_keys_str_mv AT nuratiqahmustapa featureselectionusinglawoftotalvariancewithfastcorrelationbasedfilter
AT azlynasenawi featureselectionusinglawoftotalvariancewithfastcorrelationbasedfilter
AT liangchuanzun featureselectionusinglawoftotalvariancewithfastcorrelationbasedfilter