Distance variable improvement of time-series big data stream evaluation

Abstract Real-time information mining of a big dataset consisting of time series data is a very challenging task. For this purpose, we propose using the mean distance and the standard deviation to enhance the accuracy of the existing fast incremental model tree with the drift detection (FIMT-DD) alg...

Full description

Bibliographic Details
Main Authors: Ari Wibisono, Petrus Mursanto, Jihan Adibah, Wendy D. W. T. Bayu, May Iffah Rizki, Lintang Matahari Hasani, Valian Fil Ahli
Format: Article
Language:English
Published: SpringerOpen 2020-10-01
Series:Journal of Big Data
Subjects:
Online Access:http://link.springer.com/article/10.1186/s40537-020-00359-w
_version_ 1818279195065712640
author Ari Wibisono
Petrus Mursanto
Jihan Adibah
Wendy D. W. T. Bayu
May Iffah Rizki
Lintang Matahari Hasani
Valian Fil Ahli
author_facet Ari Wibisono
Petrus Mursanto
Jihan Adibah
Wendy D. W. T. Bayu
May Iffah Rizki
Lintang Matahari Hasani
Valian Fil Ahli
author_sort Ari Wibisono
collection DOAJ
description Abstract Real-time information mining of a big dataset consisting of time series data is a very challenging task. For this purpose, we propose using the mean distance and the standard deviation to enhance the accuracy of the existing fast incremental model tree with the drift detection (FIMT-DD) algorithm. The standard FIMT-DD algorithm uses the Hoeffding bound as its splitting criterion. We propose the further use of the mean distance and standard deviation, which are used to split a tree more accurately than the standard method. We verify our proposed method using the large Traffic Demand Dataset, which consists of 4,000,000 instances; Tennet’s big wind power plant dataset, which consists of 435,268 instances; and a road weather dataset, which consists of 30,000,000 instances. The results show that our proposed FIMT-DD algorithm improves the accuracy compared to the standard method and Chernoff bound approach. The measured errors demonstrate that our approach results in a lower Mean Absolute Percentage Error (MAPE) in every stage of learning by approximately 2.49% compared with the Chernoff Bound method and 19.65% compared with the standard method.
first_indexed 2024-12-12T23:29:28Z
format Article
id doaj.art-b10356c01ec14fea9a714ac62335aec6
institution Directory Open Access Journal
issn 2196-1115
language English
last_indexed 2024-12-12T23:29:28Z
publishDate 2020-10-01
publisher SpringerOpen
record_format Article
series Journal of Big Data
spelling doaj.art-b10356c01ec14fea9a714ac62335aec62022-12-22T00:07:52ZengSpringerOpenJournal of Big Data2196-11152020-10-017111310.1186/s40537-020-00359-wDistance variable improvement of time-series big data stream evaluationAri Wibisono0Petrus Mursanto1Jihan Adibah2Wendy D. W. T. Bayu3May Iffah Rizki4Lintang Matahari Hasani5Valian Fil Ahli6Faculty of Computer Science, Universitas Indonesia, Indonesia, Kampus, UI DepokFaculty of Computer Science, Universitas Indonesia, Indonesia, Kampus, UI DepokFaculty of Computer Science, Universitas Indonesia, Indonesia, Kampus, UI DepokFaculty of Computer Science, Universitas Indonesia, Indonesia, Kampus, UI DepokFaculty of Computer Science, Universitas Indonesia, Indonesia, Kampus, UI DepokFaculty of Computer Science, Universitas Indonesia, Indonesia, Kampus, UI DepokFaculty of Computer Science, Universitas Indonesia, Indonesia, Kampus, UI DepokAbstract Real-time information mining of a big dataset consisting of time series data is a very challenging task. For this purpose, we propose using the mean distance and the standard deviation to enhance the accuracy of the existing fast incremental model tree with the drift detection (FIMT-DD) algorithm. The standard FIMT-DD algorithm uses the Hoeffding bound as its splitting criterion. We propose the further use of the mean distance and standard deviation, which are used to split a tree more accurately than the standard method. We verify our proposed method using the large Traffic Demand Dataset, which consists of 4,000,000 instances; Tennet’s big wind power plant dataset, which consists of 435,268 instances; and a road weather dataset, which consists of 30,000,000 instances. The results show that our proposed FIMT-DD algorithm improves the accuracy compared to the standard method and Chernoff bound approach. The measured errors demonstrate that our approach results in a lower Mean Absolute Percentage Error (MAPE) in every stage of learning by approximately 2.49% compared with the Chernoff Bound method and 19.65% compared with the standard method.http://link.springer.com/article/10.1186/s40537-020-00359-wIntelligent SystemsData streamDistance improvementBig data regression
spellingShingle Ari Wibisono
Petrus Mursanto
Jihan Adibah
Wendy D. W. T. Bayu
May Iffah Rizki
Lintang Matahari Hasani
Valian Fil Ahli
Distance variable improvement of time-series big data stream evaluation
Journal of Big Data
Intelligent Systems
Data stream
Distance improvement
Big data regression
title Distance variable improvement of time-series big data stream evaluation
title_full Distance variable improvement of time-series big data stream evaluation
title_fullStr Distance variable improvement of time-series big data stream evaluation
title_full_unstemmed Distance variable improvement of time-series big data stream evaluation
title_short Distance variable improvement of time-series big data stream evaluation
title_sort distance variable improvement of time series big data stream evaluation
topic Intelligent Systems
Data stream
Distance improvement
Big data regression
url http://link.springer.com/article/10.1186/s40537-020-00359-w
work_keys_str_mv AT ariwibisono distancevariableimprovementoftimeseriesbigdatastreamevaluation
AT petrusmursanto distancevariableimprovementoftimeseriesbigdatastreamevaluation
AT jihanadibah distancevariableimprovementoftimeseriesbigdatastreamevaluation
AT wendydwtbayu distancevariableimprovementoftimeseriesbigdatastreamevaluation
AT mayiffahrizki distancevariableimprovementoftimeseriesbigdatastreamevaluation
AT lintangmataharihasani distancevariableimprovementoftimeseriesbigdatastreamevaluation
AT valianfilahli distancevariableimprovementoftimeseriesbigdatastreamevaluation