Concept Drift Detection Method for Multidimensional Data Stream Based on Clustering Partition

The analysis and utilization of potential information in data stream is an important part of data stream mining.Concept drift is a huge challenge for data stream mining that the distribution of data will change with time.Detecting changes in data distribution is a direct and effective method to dete...

Full description

Bibliographic Details
Main Author: CHEN Yuan-yuan, WANG Zhi-hai
Format: Article
Language:zho
Published: Editorial office of Computer Science 2022-07-01
Series:Jisuanji kexue
Subjects:
Online Access:https://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2022-49-7-25.pdf
_version_ 1797845155314663424
author CHEN Yuan-yuan, WANG Zhi-hai
author_facet CHEN Yuan-yuan, WANG Zhi-hai
author_sort CHEN Yuan-yuan, WANG Zhi-hai
collection DOAJ
description The analysis and utilization of potential information in data stream is an important part of data stream mining.Concept drift is a huge challenge for data stream mining that the distribution of data will change with time.Detecting changes in data distribution is a direct and effective method to detect concept drift.Currently,some concept drift detection methods use the tree structure or grid to establish a histogram to describe the data distribution.However,the tree structure is easy to produce inspection blind spots and leads to poor interpretability.While using the grid method on multi-dimensional data,the memory consumption is too much.To solve the above problems,a concept drift detection method for multi-dimensional data streams called partition based on uniform density clusters(PUDC) is proposed.The algorithm is based on the <i>k</i>-Means algorithm to partition the data with uniform density and uses the chi-square test for statistics and calculation of each partition to detect the concept drift.To ve-rify the validity of the method,four artificial datasets and three real datasets were selected for experiments.The type I and type II error rates of different dimensions of data were compared and analyzed.Experimental results show that PUDC algorithm is superior to several new algorithms in concept drift detection of multi-dimensional data streams.
first_indexed 2024-04-09T17:35:18Z
format Article
id doaj.art-c1964bdc2a4f4ed1836b4518abc243ea
institution Directory Open Access Journal
issn 1002-137X
language zho
last_indexed 2024-04-09T17:35:18Z
publishDate 2022-07-01
publisher Editorial office of Computer Science
record_format Article
series Jisuanji kexue
spelling doaj.art-c1964bdc2a4f4ed1836b4518abc243ea2023-04-18T02:32:12ZzhoEditorial office of Computer ScienceJisuanji kexue1002-137X2022-07-01497253010.11896/jsjkx.210600155Concept Drift Detection Method for Multidimensional Data Stream Based on Clustering PartitionCHEN Yuan-yuan, WANG Zhi-hai0School of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044,China;Beijing Key Laboratory of Traffic Data Analysis and Mining,Beijing Jiaotong University,Beijing 100044,ChinaThe analysis and utilization of potential information in data stream is an important part of data stream mining.Concept drift is a huge challenge for data stream mining that the distribution of data will change with time.Detecting changes in data distribution is a direct and effective method to detect concept drift.Currently,some concept drift detection methods use the tree structure or grid to establish a histogram to describe the data distribution.However,the tree structure is easy to produce inspection blind spots and leads to poor interpretability.While using the grid method on multi-dimensional data,the memory consumption is too much.To solve the above problems,a concept drift detection method for multi-dimensional data streams called partition based on uniform density clusters(PUDC) is proposed.The algorithm is based on the <i>k</i>-Means algorithm to partition the data with uniform density and uses the chi-square test for statistics and calculation of each partition to detect the concept drift.To ve-rify the validity of the method,four artificial datasets and three real datasets were selected for experiments.The type I and type II error rates of different dimensions of data were compared and analyzed.Experimental results show that PUDC algorithm is superior to several new algorithms in concept drift detection of multi-dimensional data streams.https://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2022-49-7-25.pdfdata stream mining|concept drift detection|<i>k</i>-means|hypothetical test|histogram
spellingShingle CHEN Yuan-yuan, WANG Zhi-hai
Concept Drift Detection Method for Multidimensional Data Stream Based on Clustering Partition
Jisuanji kexue
data stream mining|concept drift detection|<i>k</i>-means|hypothetical test|histogram
title Concept Drift Detection Method for Multidimensional Data Stream Based on Clustering Partition
title_full Concept Drift Detection Method for Multidimensional Data Stream Based on Clustering Partition
title_fullStr Concept Drift Detection Method for Multidimensional Data Stream Based on Clustering Partition
title_full_unstemmed Concept Drift Detection Method for Multidimensional Data Stream Based on Clustering Partition
title_short Concept Drift Detection Method for Multidimensional Data Stream Based on Clustering Partition
title_sort concept drift detection method for multidimensional data stream based on clustering partition
topic data stream mining|concept drift detection|<i>k</i>-means|hypothetical test|histogram
url https://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2022-49-7-25.pdf
work_keys_str_mv AT chenyuanyuanwangzhihai conceptdriftdetectionmethodformultidimensionaldatastreambasedonclusteringpartition