Distance variable improvement of time-series big data stream evaluation
Abstract Real-time information mining of a big dataset consisting of time series data is a very challenging task. For this purpose, we propose using the mean distance and the standard deviation to enhance the accuracy of the existing fast incremental model tree with the drift detection (FIMT-DD) alg...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2020-10-01
|
Series: | Journal of Big Data |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s40537-020-00359-w |
_version_ | 1818279195065712640 |
---|---|
author | Ari Wibisono Petrus Mursanto Jihan Adibah Wendy D. W. T. Bayu May Iffah Rizki Lintang Matahari Hasani Valian Fil Ahli |
author_facet | Ari Wibisono Petrus Mursanto Jihan Adibah Wendy D. W. T. Bayu May Iffah Rizki Lintang Matahari Hasani Valian Fil Ahli |
author_sort | Ari Wibisono |
collection | DOAJ |
description | Abstract Real-time information mining of a big dataset consisting of time series data is a very challenging task. For this purpose, we propose using the mean distance and the standard deviation to enhance the accuracy of the existing fast incremental model tree with the drift detection (FIMT-DD) algorithm. The standard FIMT-DD algorithm uses the Hoeffding bound as its splitting criterion. We propose the further use of the mean distance and standard deviation, which are used to split a tree more accurately than the standard method. We verify our proposed method using the large Traffic Demand Dataset, which consists of 4,000,000 instances; Tennet’s big wind power plant dataset, which consists of 435,268 instances; and a road weather dataset, which consists of 30,000,000 instances. The results show that our proposed FIMT-DD algorithm improves the accuracy compared to the standard method and Chernoff bound approach. The measured errors demonstrate that our approach results in a lower Mean Absolute Percentage Error (MAPE) in every stage of learning by approximately 2.49% compared with the Chernoff Bound method and 19.65% compared with the standard method. |
first_indexed | 2024-12-12T23:29:28Z |
format | Article |
id | doaj.art-b10356c01ec14fea9a714ac62335aec6 |
institution | Directory Open Access Journal |
issn | 2196-1115 |
language | English |
last_indexed | 2024-12-12T23:29:28Z |
publishDate | 2020-10-01 |
publisher | SpringerOpen |
record_format | Article |
series | Journal of Big Data |
spelling | doaj.art-b10356c01ec14fea9a714ac62335aec62022-12-22T00:07:52ZengSpringerOpenJournal of Big Data2196-11152020-10-017111310.1186/s40537-020-00359-wDistance variable improvement of time-series big data stream evaluationAri Wibisono0Petrus Mursanto1Jihan Adibah2Wendy D. W. T. Bayu3May Iffah Rizki4Lintang Matahari Hasani5Valian Fil Ahli6Faculty of Computer Science, Universitas Indonesia, Indonesia, Kampus, UI DepokFaculty of Computer Science, Universitas Indonesia, Indonesia, Kampus, UI DepokFaculty of Computer Science, Universitas Indonesia, Indonesia, Kampus, UI DepokFaculty of Computer Science, Universitas Indonesia, Indonesia, Kampus, UI DepokFaculty of Computer Science, Universitas Indonesia, Indonesia, Kampus, UI DepokFaculty of Computer Science, Universitas Indonesia, Indonesia, Kampus, UI DepokFaculty of Computer Science, Universitas Indonesia, Indonesia, Kampus, UI DepokAbstract Real-time information mining of a big dataset consisting of time series data is a very challenging task. For this purpose, we propose using the mean distance and the standard deviation to enhance the accuracy of the existing fast incremental model tree with the drift detection (FIMT-DD) algorithm. The standard FIMT-DD algorithm uses the Hoeffding bound as its splitting criterion. We propose the further use of the mean distance and standard deviation, which are used to split a tree more accurately than the standard method. We verify our proposed method using the large Traffic Demand Dataset, which consists of 4,000,000 instances; Tennet’s big wind power plant dataset, which consists of 435,268 instances; and a road weather dataset, which consists of 30,000,000 instances. The results show that our proposed FIMT-DD algorithm improves the accuracy compared to the standard method and Chernoff bound approach. The measured errors demonstrate that our approach results in a lower Mean Absolute Percentage Error (MAPE) in every stage of learning by approximately 2.49% compared with the Chernoff Bound method and 19.65% compared with the standard method.http://link.springer.com/article/10.1186/s40537-020-00359-wIntelligent SystemsData streamDistance improvementBig data regression |
spellingShingle | Ari Wibisono Petrus Mursanto Jihan Adibah Wendy D. W. T. Bayu May Iffah Rizki Lintang Matahari Hasani Valian Fil Ahli Distance variable improvement of time-series big data stream evaluation Journal of Big Data Intelligent Systems Data stream Distance improvement Big data regression |
title | Distance variable improvement of time-series big data stream evaluation |
title_full | Distance variable improvement of time-series big data stream evaluation |
title_fullStr | Distance variable improvement of time-series big data stream evaluation |
title_full_unstemmed | Distance variable improvement of time-series big data stream evaluation |
title_short | Distance variable improvement of time-series big data stream evaluation |
title_sort | distance variable improvement of time series big data stream evaluation |
topic | Intelligent Systems Data stream Distance improvement Big data regression |
url | http://link.springer.com/article/10.1186/s40537-020-00359-w |
work_keys_str_mv | AT ariwibisono distancevariableimprovementoftimeseriesbigdatastreamevaluation AT petrusmursanto distancevariableimprovementoftimeseriesbigdatastreamevaluation AT jihanadibah distancevariableimprovementoftimeseriesbigdatastreamevaluation AT wendydwtbayu distancevariableimprovementoftimeseriesbigdatastreamevaluation AT mayiffahrizki distancevariableimprovementoftimeseriesbigdatastreamevaluation AT lintangmataharihasani distancevariableimprovementoftimeseriesbigdatastreamevaluation AT valianfilahli distancevariableimprovementoftimeseriesbigdatastreamevaluation |