Online Streaming Features Selection via Markov Blanket
Streaming feature selection has always been an excellent method for selecting the relevant subset of features from high-dimensional data and overcoming learning complexity. However, little attention is paid to online feature selection through the Markov Blanket (MB). Several studies based on traditi...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-01-01
|
Series: | Symmetry |
Subjects: | |
Online Access: | https://www.mdpi.com/2073-8994/14/1/149 |
_version_ | 1797490100648542208 |
---|---|
author | Waqar Khan Lingfu Kong Brekhna Brekhna Ling Wang Huigui Yan |
author_facet | Waqar Khan Lingfu Kong Brekhna Brekhna Ling Wang Huigui Yan |
author_sort | Waqar Khan |
collection | DOAJ |
description | Streaming feature selection has always been an excellent method for selecting the relevant subset of features from high-dimensional data and overcoming learning complexity. However, little attention is paid to online feature selection through the Markov Blanket (MB). Several studies based on traditional MB learning presented low prediction accuracy and used fewer datasets as the number of conditional independence tests is high and consumes more time. This paper presents a novel algorithm called Online Feature Selection Via Markov Blanket (OFSVMB) based on a statistical conditional independence test offering high accuracy and less computation time. It reduces the number of conditional independence tests and incorporates the online relevance and redundant analysis to check the relevancy between the upcoming feature and target variable T, discard the redundant features from Parents-Child (PC) and Spouses (SP) online, and find PC and SP simultaneously. The performance OFSVMB is compared with traditional MB learning algorithms including IAMB, STMB, HITON-MB, BAMB, and EEMB, and Streaming feature selection algorithms including OSFS, Alpha-investing, and SAOLA on 9 benchmark Bayesian Network (BN) datasets and 14 real-world datasets. For the performance evaluation, F1, precision, and recall measures are used with a significant level of 0.01 and 0.05 on benchmark BN and real-world datasets, including 12 classifiers keeping a significant level of 0.01. On benchmark BN datasets with 500 and 5000 sample sizes, OFSVMB achieved significant accuracy than IAMB, STMB, HITON-MB, BAMB, and EEMB in terms of F1, precision, recall, and running faster. It finds more accurate MB regardless of the size of the features set. In contrast, OFSVMB offers substantial improvements based on mean prediction accuracy regarding 12 classifiers with small and large sample sizes on real-world datasets than OSFS, Alpha-investing, and SAOLA but slower than OSFS, Alpha-investing, and SAOLA because these algorithms only find the PC set but not SP. Furthermore, the sensitivity analysis shows that OFSVMB is more accurate in selecting the optimal features. |
first_indexed | 2024-03-10T00:26:09Z |
format | Article |
id | doaj.art-e58208b4cd8440778d8b411fae484b3c |
institution | Directory Open Access Journal |
issn | 2073-8994 |
language | English |
last_indexed | 2024-03-10T00:26:09Z |
publishDate | 2022-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Symmetry |
spelling | doaj.art-e58208b4cd8440778d8b411fae484b3c2023-11-23T15:34:12ZengMDPI AGSymmetry2073-89942022-01-0114114910.3390/sym14010149Online Streaming Features Selection via Markov BlanketWaqar Khan0Lingfu Kong1Brekhna Brekhna2Ling Wang3Huigui Yan4School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, ChinaSchool of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, ChinaSchool of Computer Science and Technology, Shandong University of Finance and Economics, Jinan 250014, ChinaSchool of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, ChinaSchool of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, ChinaStreaming feature selection has always been an excellent method for selecting the relevant subset of features from high-dimensional data and overcoming learning complexity. However, little attention is paid to online feature selection through the Markov Blanket (MB). Several studies based on traditional MB learning presented low prediction accuracy and used fewer datasets as the number of conditional independence tests is high and consumes more time. This paper presents a novel algorithm called Online Feature Selection Via Markov Blanket (OFSVMB) based on a statistical conditional independence test offering high accuracy and less computation time. It reduces the number of conditional independence tests and incorporates the online relevance and redundant analysis to check the relevancy between the upcoming feature and target variable T, discard the redundant features from Parents-Child (PC) and Spouses (SP) online, and find PC and SP simultaneously. The performance OFSVMB is compared with traditional MB learning algorithms including IAMB, STMB, HITON-MB, BAMB, and EEMB, and Streaming feature selection algorithms including OSFS, Alpha-investing, and SAOLA on 9 benchmark Bayesian Network (BN) datasets and 14 real-world datasets. For the performance evaluation, F1, precision, and recall measures are used with a significant level of 0.01 and 0.05 on benchmark BN and real-world datasets, including 12 classifiers keeping a significant level of 0.01. On benchmark BN datasets with 500 and 5000 sample sizes, OFSVMB achieved significant accuracy than IAMB, STMB, HITON-MB, BAMB, and EEMB in terms of F1, precision, recall, and running faster. It finds more accurate MB regardless of the size of the features set. In contrast, OFSVMB offers substantial improvements based on mean prediction accuracy regarding 12 classifiers with small and large sample sizes on real-world datasets than OSFS, Alpha-investing, and SAOLA but slower than OSFS, Alpha-investing, and SAOLA because these algorithms only find the PC set but not SP. Furthermore, the sensitivity analysis shows that OFSVMB is more accurate in selecting the optimal features.https://www.mdpi.com/2073-8994/14/1/149Bayesian networkMarkov blanketstreaming featurefeature selectionbig dataconditional independence test |
spellingShingle | Waqar Khan Lingfu Kong Brekhna Brekhna Ling Wang Huigui Yan Online Streaming Features Selection via Markov Blanket Symmetry Bayesian network Markov blanket streaming feature feature selection big data conditional independence test |
title | Online Streaming Features Selection via Markov Blanket |
title_full | Online Streaming Features Selection via Markov Blanket |
title_fullStr | Online Streaming Features Selection via Markov Blanket |
title_full_unstemmed | Online Streaming Features Selection via Markov Blanket |
title_short | Online Streaming Features Selection via Markov Blanket |
title_sort | online streaming features selection via markov blanket |
topic | Bayesian network Markov blanket streaming feature feature selection big data conditional independence test |
url | https://www.mdpi.com/2073-8994/14/1/149 |
work_keys_str_mv | AT waqarkhan onlinestreamingfeaturesselectionviamarkovblanket AT lingfukong onlinestreamingfeaturesselectionviamarkovblanket AT brekhnabrekhna onlinestreamingfeaturesselectionviamarkovblanket AT lingwang onlinestreamingfeaturesselectionviamarkovblanket AT huiguiyan onlinestreamingfeaturesselectionviamarkovblanket |