A buffer-based online clustering for evolving data stream

Data stream clustering plays an important role in data stream mining for knowledge extraction. Numerous researchers have recently studied density-based clustering algorithms due to their capability to generate arbitrarily shaped clusters. However, most of the algorithms are either fully offline, hyb...

Full description

Bibliographic Details
Main Authors: Islam, Md. Kamrul, Ahmed, Md. Manjur, Kamal Z., Zamli
Format: Article
Language:English
Published: Elsevier Ltd 2019
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/24676/1/A%20buffer-based%20online%20clustering%20for%20evolving%20data%20stream.pdf
_version_ 1796993321990619136
author Islam, Md. Kamrul
Ahmed, Md. Manjur
Kamal Z., Zamli
author_facet Islam, Md. Kamrul
Ahmed, Md. Manjur
Kamal Z., Zamli
author_sort Islam, Md. Kamrul
collection UMP
description Data stream clustering plays an important role in data stream mining for knowledge extraction. Numerous researchers have recently studied density-based clustering algorithms due to their capability to generate arbitrarily shaped clusters. However, most of the algorithms are either fully offline, hybrid online/offline, or cannot handle the property of evolving data stream. Recently, a fully online clustering algorithm for evolving data stream called CEDAS was proposed. However, similar to other density-based clustering algorithms, CEDAS requires predefining the global optimal radius of micro-clusters, which is a difficult task; in addition, an erroneous choice deteriorates cluster performance. Moreover, the algorithm ignores the presence of temporarily irrelevant micro-clusters, which may be relevant in the future. In this study, we present a fully online density-based clustering algorithm called buffer-based online clustering for evolving data stream (BOCEDS). This algorithm recursively updates the micro-cluster radius to its local optimal. It also introduces a buffer for storing irrelevant micro-clusters and a fully online pruning method for extracting the temporarily irrelevant micro-cluster from the buffer. In addition, BOCEDS proposes an online micro-cluster energy-updating function based on the spatial information of the data stream. Experimental results are compared with those of CEDAS and other alternative hybrid online/offline density-based clustering algorithms, and BOCEDS proves its superiority over the other clustering algorithms. The sensitivity of clustering parameters is also measured. The proposed algorithm is then applied to real-world weather data streams to demonstrate its capability to detect changes in data stream and discover arbitrarily shaped clusters. The proposed BOCEDS can be available in https://sites.google.com/view/md-manjur-ahmed and https://sites.google.com/view/kamrul-just.
first_indexed 2024-03-06T12:32:21Z
format Article
id UMPir24676
institution Universiti Malaysia Pahang
language English
last_indexed 2024-03-06T12:32:21Z
publishDate 2019
publisher Elsevier Ltd
record_format dspace
spelling UMPir246762019-04-02T07:34:52Z http://umpir.ump.edu.my/id/eprint/24676/ A buffer-based online clustering for evolving data stream Islam, Md. Kamrul Ahmed, Md. Manjur Kamal Z., Zamli QA75 Electronic computers. Computer science Data stream clustering plays an important role in data stream mining for knowledge extraction. Numerous researchers have recently studied density-based clustering algorithms due to their capability to generate arbitrarily shaped clusters. However, most of the algorithms are either fully offline, hybrid online/offline, or cannot handle the property of evolving data stream. Recently, a fully online clustering algorithm for evolving data stream called CEDAS was proposed. However, similar to other density-based clustering algorithms, CEDAS requires predefining the global optimal radius of micro-clusters, which is a difficult task; in addition, an erroneous choice deteriorates cluster performance. Moreover, the algorithm ignores the presence of temporarily irrelevant micro-clusters, which may be relevant in the future. In this study, we present a fully online density-based clustering algorithm called buffer-based online clustering for evolving data stream (BOCEDS). This algorithm recursively updates the micro-cluster radius to its local optimal. It also introduces a buffer for storing irrelevant micro-clusters and a fully online pruning method for extracting the temporarily irrelevant micro-cluster from the buffer. In addition, BOCEDS proposes an online micro-cluster energy-updating function based on the spatial information of the data stream. Experimental results are compared with those of CEDAS and other alternative hybrid online/offline density-based clustering algorithms, and BOCEDS proves its superiority over the other clustering algorithms. The sensitivity of clustering parameters is also measured. The proposed algorithm is then applied to real-world weather data streams to demonstrate its capability to detect changes in data stream and discover arbitrarily shaped clusters. The proposed BOCEDS can be available in https://sites.google.com/view/md-manjur-ahmed and https://sites.google.com/view/kamrul-just. Elsevier Ltd 2019 Article PeerReviewed pdf en http://umpir.ump.edu.my/id/eprint/24676/1/A%20buffer-based%20online%20clustering%20for%20evolving%20data%20stream.pdf Islam, Md. Kamrul and Ahmed, Md. Manjur and Kamal Z., Zamli (2019) A buffer-based online clustering for evolving data stream. Information Sciences, 489. pp. 113-135. ISSN 0020-0255. (Published) https://doi.org/10.1016/j.ins.2019.03.022 https://doi.org/10.1016/j.ins.2019.03.022
spellingShingle QA75 Electronic computers. Computer science
Islam, Md. Kamrul
Ahmed, Md. Manjur
Kamal Z., Zamli
A buffer-based online clustering for evolving data stream
title A buffer-based online clustering for evolving data stream
title_full A buffer-based online clustering for evolving data stream
title_fullStr A buffer-based online clustering for evolving data stream
title_full_unstemmed A buffer-based online clustering for evolving data stream
title_short A buffer-based online clustering for evolving data stream
title_sort buffer based online clustering for evolving data stream
topic QA75 Electronic computers. Computer science
url http://umpir.ump.edu.my/id/eprint/24676/1/A%20buffer-based%20online%20clustering%20for%20evolving%20data%20stream.pdf
work_keys_str_mv AT islammdkamrul abufferbasedonlineclusteringforevolvingdatastream
AT ahmedmdmanjur abufferbasedonlineclusteringforevolvingdatastream
AT kamalzzamli abufferbasedonlineclusteringforevolvingdatastream
AT islammdkamrul bufferbasedonlineclusteringforevolvingdatastream
AT ahmedmdmanjur bufferbasedonlineclusteringforevolvingdatastream
AT kamalzzamli bufferbasedonlineclusteringforevolvingdatastream