Scalable teacher forcing network for semi-supervised large scale data streams
The large-scale data stream problem refers to high-speed information flow which cannot be processed in scalable manner under a traditional computing platform. This problem also imposes expensive labelling cost making the deployment of fully supervised algorithms unfeasible. On the other hand, the pr...
Main Authors: | , , , , |
---|---|
Other Authors: | |
Format: | Journal Article |
Language: | English |
Published: |
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/159514 |
_version_ | 1824454339556540416 |
---|---|
author | Pratama, Mahardhika Za'in, Choiru Lughofer, Edwin Pardede, Eric Rahayu, Dwi A. P. |
author2 | School of Computer Science and Engineering |
author_facet | School of Computer Science and Engineering Pratama, Mahardhika Za'in, Choiru Lughofer, Edwin Pardede, Eric Rahayu, Dwi A. P. |
author_sort | Pratama, Mahardhika |
collection | NTU |
description | The large-scale data stream problem refers to high-speed information flow which cannot be processed in scalable manner under a traditional computing platform. This problem also imposes expensive labelling cost making the deployment of fully supervised algorithms unfeasible. On the other hand, the problem of semi-supervised large-scale data streams is little explored in the literature because most works are designed in the traditional single-node computing environments while also being fully supervised approaches. This paper offers Weakly Supervised Scalable Teacher Forcing Network (WeScatterNet) to cope with the scarcity of labelled samples and the large-scale data streams simultaneously. WeScatterNet is crafted under distributed computing platform of Apache Spark with a data-free model fusion strategy for model compression after parallel computing stage. It features an open network structure to address the global and local drift problems while integrating a data augmentation, annotation and auto-correction (DA3) method for handling partially labelled data streams. The performance of WeScatterNet is numerically evaluated in the six large-scale data stream problems with only 25% label proportions. It shows highly competitive performance even if compared with fully supervised learners with 100% label proportions. |
first_indexed | 2025-02-19T03:20:45Z |
format | Journal Article |
id | ntu-10356/159514 |
institution | Nanyang Technological University |
language | English |
last_indexed | 2025-02-19T03:20:45Z |
publishDate | 2022 |
record_format | dspace |
spelling | ntu-10356/1595142022-06-24T07:00:07Z Scalable teacher forcing network for semi-supervised large scale data streams Pratama, Mahardhika Za'in, Choiru Lughofer, Edwin Pardede, Eric Rahayu, Dwi A. P. School of Computer Science and Engineering Engineering::Computer science and engineering Evolving Fuzzy Systems Concept Drifts The large-scale data stream problem refers to high-speed information flow which cannot be processed in scalable manner under a traditional computing platform. This problem also imposes expensive labelling cost making the deployment of fully supervised algorithms unfeasible. On the other hand, the problem of semi-supervised large-scale data streams is little explored in the literature because most works are designed in the traditional single-node computing environments while also being fully supervised approaches. This paper offers Weakly Supervised Scalable Teacher Forcing Network (WeScatterNet) to cope with the scarcity of labelled samples and the large-scale data streams simultaneously. WeScatterNet is crafted under distributed computing platform of Apache Spark with a data-free model fusion strategy for model compression after parallel computing stage. It features an open network structure to address the global and local drift problems while integrating a data augmentation, annotation and auto-correction (DA3) method for handling partially labelled data streams. The performance of WeScatterNet is numerically evaluated in the six large-scale data stream problems with only 25% label proportions. It shows highly competitive performance even if compared with fully supervised learners with 100% label proportions. Ministry of Education (MOE) This work is supported by Ministry of Education Republic of Singapore Tier 1 research grant. The third author acknowledges the support by the 'LCM - K2 Center for Symbiotic Mechatronics' within the framework of the Austrian COMET-K2 program. 2022-06-24T07:00:07Z 2022-06-24T07:00:07Z 2021 Journal Article Pratama, M., Za'in, C., Lughofer, E., Pardede, E. & Rahayu, D. A. P. (2021). Scalable teacher forcing network for semi-supervised large scale data streams. Information Sciences, 576, 407-431. https://dx.doi.org/10.1016/j.ins.2021.06.075 0020-0255 https://hdl.handle.net/10356/159514 10.1016/j.ins.2021.06.075 2-s2.0-85109455526 576 407 431 en Information Sciences © 2021 Elsevier Inc. All rights reserved. |
spellingShingle | Engineering::Computer science and engineering Evolving Fuzzy Systems Concept Drifts Pratama, Mahardhika Za'in, Choiru Lughofer, Edwin Pardede, Eric Rahayu, Dwi A. P. Scalable teacher forcing network for semi-supervised large scale data streams |
title | Scalable teacher forcing network for semi-supervised large scale data streams |
title_full | Scalable teacher forcing network for semi-supervised large scale data streams |
title_fullStr | Scalable teacher forcing network for semi-supervised large scale data streams |
title_full_unstemmed | Scalable teacher forcing network for semi-supervised large scale data streams |
title_short | Scalable teacher forcing network for semi-supervised large scale data streams |
title_sort | scalable teacher forcing network for semi supervised large scale data streams |
topic | Engineering::Computer science and engineering Evolving Fuzzy Systems Concept Drifts |
url | https://hdl.handle.net/10356/159514 |
work_keys_str_mv | AT pratamamahardhika scalableteacherforcingnetworkforsemisupervisedlargescaledatastreams AT zainchoiru scalableteacherforcingnetworkforsemisupervisedlargescaledatastreams AT lughoferedwin scalableteacherforcingnetworkforsemisupervisedlargescaledatastreams AT pardedeeric scalableteacherforcingnetworkforsemisupervisedlargescaledatastreams AT rahayudwiap scalableteacherforcingnetworkforsemisupervisedlargescaledatastreams |