Feature selection using law of total variance with fast correlation-based filter
The increased dimensionality of data poses a formidable obstacle to completing data mining tasks. Due to the extraneous features associated with high-dimensional data, processing and analysis took longer and were less precise. As a pre-processing phase in the analysis of data mining tasks, feature s...
Main Authors: | , , |
---|---|
Format: | Conference or Workshop Item |
Language: | English English |
Published: |
Institute of Electrical and Electronics Engineers Inc.
2023
|
Subjects: | |
Online Access: | http://umpir.ump.edu.my/id/eprint/40376/1/Feature%20selection%20using%20law%20of%20total%20variance.pdf http://umpir.ump.edu.my/id/eprint/40376/2/Feature%20selection%20using%20law%20of%20total%20variance%20with%20fast%20correlation-based%20filter_ABS.pdf |
_version_ | 1796996242876661760 |
---|---|
author | Nur Atiqah, Mustapa Azlyna, Senawi Liang, Chuanzun |
author_facet | Nur Atiqah, Mustapa Azlyna, Senawi Liang, Chuanzun |
author_sort | Nur Atiqah, Mustapa |
collection | UMP |
description | The increased dimensionality of data poses a formidable obstacle to completing data mining tasks. Due to the extraneous features associated with high-dimensional data, processing and analysis took longer and were less precise. As a pre-processing phase in the analysis of data mining tasks, feature selection is effective at reducing dimensionality, removing irrelevant characteristics, increasing accuracy, and enhancing the readability of the results. This research proposes the law of total variance with fast correlation-based filter (LTVFCBF) as a new feature selection method. LTVFCBF chose the significant features by identifying relevant features and remove redundant features among the relevant ones. The analysis was conducted with ten datasets of varied dimensionality to evaluate the performance of the proposed LTVFCBF and validated using four classifiers: K-nearest neighbours, Naïve Bayes, support vector machine, and bagging. The LTVFCBF and LTV methods have been compared in terms of the number of selected features, classification accuracy, and execution time. In overall, the suggested LTVFCBF has the potential to minimize the dimensionality of data by selecting a lower number of significant features with better accuracy. However, it requires a slightly higher execution time compared to LTV. Aside from that, LTVFCBF can achieve comparable accuracy with faster execution time when less than half of the original features are maintained. The proposed method can produce a promising outcome and may be regarded as an effective filter approach for feature selection. |
first_indexed | 2024-04-22T01:26:03Z |
format | Conference or Workshop Item |
id | UMPir40376 |
institution | Universiti Malaysia Pahang |
language | English English |
last_indexed | 2024-04-22T01:26:03Z |
publishDate | 2023 |
publisher | Institute of Electrical and Electronics Engineers Inc. |
record_format | dspace |
spelling | UMPir403762024-04-16T04:18:34Z http://umpir.ump.edu.my/id/eprint/40376/ Feature selection using law of total variance with fast correlation-based filter Nur Atiqah, Mustapa Azlyna, Senawi Liang, Chuanzun Q Science (General) QA Mathematics The increased dimensionality of data poses a formidable obstacle to completing data mining tasks. Due to the extraneous features associated with high-dimensional data, processing and analysis took longer and were less precise. As a pre-processing phase in the analysis of data mining tasks, feature selection is effective at reducing dimensionality, removing irrelevant characteristics, increasing accuracy, and enhancing the readability of the results. This research proposes the law of total variance with fast correlation-based filter (LTVFCBF) as a new feature selection method. LTVFCBF chose the significant features by identifying relevant features and remove redundant features among the relevant ones. The analysis was conducted with ten datasets of varied dimensionality to evaluate the performance of the proposed LTVFCBF and validated using four classifiers: K-nearest neighbours, Naïve Bayes, support vector machine, and bagging. The LTVFCBF and LTV methods have been compared in terms of the number of selected features, classification accuracy, and execution time. In overall, the suggested LTVFCBF has the potential to minimize the dimensionality of data by selecting a lower number of significant features with better accuracy. However, it requires a slightly higher execution time compared to LTV. Aside from that, LTVFCBF can achieve comparable accuracy with faster execution time when less than half of the original features are maintained. The proposed method can produce a promising outcome and may be regarded as an effective filter approach for feature selection. Institute of Electrical and Electronics Engineers Inc. 2023 Conference or Workshop Item PeerReviewed pdf en http://umpir.ump.edu.my/id/eprint/40376/1/Feature%20selection%20using%20law%20of%20total%20variance.pdf pdf en http://umpir.ump.edu.my/id/eprint/40376/2/Feature%20selection%20using%20law%20of%20total%20variance%20with%20fast%20correlation-based%20filter_ABS.pdf Nur Atiqah, Mustapa and Azlyna, Senawi and Liang, Chuanzun (2023) Feature selection using law of total variance with fast correlation-based filter. In: 8th International Conference on Software Engineering and Computer Systems, ICSECS 2023 , 25-27 August 2023 , Penang. pp. 35-40. (192961). ISBN 979-835031093-1 https://doi.org/10.1109/ICSECS58457.2023.10256367 |
spellingShingle | Q Science (General) QA Mathematics Nur Atiqah, Mustapa Azlyna, Senawi Liang, Chuanzun Feature selection using law of total variance with fast correlation-based filter |
title | Feature selection using law of total variance with fast correlation-based filter |
title_full | Feature selection using law of total variance with fast correlation-based filter |
title_fullStr | Feature selection using law of total variance with fast correlation-based filter |
title_full_unstemmed | Feature selection using law of total variance with fast correlation-based filter |
title_short | Feature selection using law of total variance with fast correlation-based filter |
title_sort | feature selection using law of total variance with fast correlation based filter |
topic | Q Science (General) QA Mathematics |
url | http://umpir.ump.edu.my/id/eprint/40376/1/Feature%20selection%20using%20law%20of%20total%20variance.pdf http://umpir.ump.edu.my/id/eprint/40376/2/Feature%20selection%20using%20law%20of%20total%20variance%20with%20fast%20correlation-based%20filter_ABS.pdf |
work_keys_str_mv | AT nuratiqahmustapa featureselectionusinglawoftotalvariancewithfastcorrelationbasedfilter AT azlynasenawi featureselectionusinglawoftotalvariancewithfastcorrelationbasedfilter AT liangchuanzun featureselectionusinglawoftotalvariancewithfastcorrelationbasedfilter |