Concept Drift Detection Method for Multidimensional Data Stream Based on Clustering Partition

The analysis and utilization of potential information in data stream is an important part of data stream mining.Concept drift is a huge challenge for data stream mining that the distribution of data will change with time.Detecting changes in data distribution is a direct and effective method to dete...

Full description

Bibliographic Details
Main Author:	CHEN Yuan-yuan, WANG Zhi-hai
Format:	Article
Language:	zho
Published:	Editorial office of Computer Science 2022-07-01
Series:	Jisuanji kexue
Subjects:	data stream mining\|concept drift detection\|<i>k</i>-means\|hypothetical test\|histogram
Online Access:	https://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2022-49-7-25.pdf

_version_	1797845155314663424
author	CHEN Yuan-yuan, WANG Zhi-hai
author_facet	CHEN Yuan-yuan, WANG Zhi-hai
author_sort	CHEN Yuan-yuan, WANG Zhi-hai
collection	DOAJ
description	The analysis and utilization of potential information in data stream is an important part of data stream mining.Concept drift is a huge challenge for data stream mining that the distribution of data will change with time.Detecting changes in data distribution is a direct and effective method to detect concept drift.Currently,some concept drift detection methods use the tree structure or grid to establish a histogram to describe the data distribution.However,the tree structure is easy to produce inspection blind spots and leads to poor interpretability.While using the grid method on multi-dimensional data,the memory consumption is too much.To solve the above problems,a concept drift detection method for multi-dimensional data streams called partition based on uniform density clusters(PUDC) is proposed.The algorithm is based on the <i>k</i>-Means algorithm to partition the data with uniform density and uses the chi-square test for statistics and calculation of each partition to detect the concept drift.To ve-rify the validity of the method,four artificial datasets and three real datasets were selected for experiments.The type I and type II error rates of different dimensions of data were compared and analyzed.Experimental results show that PUDC algorithm is superior to several new algorithms in concept drift detection of multi-dimensional data streams.
first_indexed	2024-04-09T17:35:18Z
format	Article
id	doaj.art-c1964bdc2a4f4ed1836b4518abc243ea
institution	Directory Open Access Journal
issn	1002-137X
language	zho
last_indexed	2024-04-09T17:35:18Z
publishDate	2022-07-01
publisher	Editorial office of Computer Science
record_format	Article
series	Jisuanji kexue
spelling	doaj.art-c1964bdc2a4f4ed1836b4518abc243ea2023-04-18T02:32:12ZzhoEditorial office of Computer ScienceJisuanji kexue1002-137X2022-07-01497253010.11896/jsjkx.210600155Concept Drift Detection Method for Multidimensional Data Stream Based on Clustering PartitionCHEN Yuan-yuan, WANG Zhi-hai0School of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044,China;Beijing Key Laboratory of Traffic Data Analysis and Mining,Beijing Jiaotong University,Beijing 100044,ChinaThe analysis and utilization of potential information in data stream is an important part of data stream mining.Concept drift is a huge challenge for data stream mining that the distribution of data will change with time.Detecting changes in data distribution is a direct and effective method to detect concept drift.Currently,some concept drift detection methods use the tree structure or grid to establish a histogram to describe the data distribution.However,the tree structure is easy to produce inspection blind spots and leads to poor interpretability.While using the grid method on multi-dimensional data,the memory consumption is too much.To solve the above problems,a concept drift detection method for multi-dimensional data streams called partition based on uniform density clusters(PUDC) is proposed.The algorithm is based on the <i>k</i>-Means algorithm to partition the data with uniform density and uses the chi-square test for statistics and calculation of each partition to detect the concept drift.To ve-rify the validity of the method,four artificial datasets and three real datasets were selected for experiments.The type I and type II error rates of different dimensions of data were compared and analyzed.Experimental results show that PUDC algorithm is superior to several new algorithms in concept drift detection of multi-dimensional data streams.https://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2022-49-7-25.pdfdata stream mining\|concept drift detection\|<i>k</i>-means\|hypothetical test\|histogram
spellingShingle	CHEN Yuan-yuan, WANG Zhi-hai Concept Drift Detection Method for Multidimensional Data Stream Based on Clustering Partition Jisuanji kexue data stream mining\|concept drift detection\|<i>k</i>-means\|hypothetical test\|histogram
title	Concept Drift Detection Method for Multidimensional Data Stream Based on Clustering Partition
title_full	Concept Drift Detection Method for Multidimensional Data Stream Based on Clustering Partition
title_fullStr	Concept Drift Detection Method for Multidimensional Data Stream Based on Clustering Partition
title_full_unstemmed	Concept Drift Detection Method for Multidimensional Data Stream Based on Clustering Partition
title_short	Concept Drift Detection Method for Multidimensional Data Stream Based on Clustering Partition
title_sort	concept drift detection method for multidimensional data stream based on clustering partition
topic	data stream mining\|concept drift detection\|<i>k</i>-means\|hypothetical test\|histogram
url	https://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2022-49-7-25.pdf
work_keys_str_mv	AT chenyuanyuanwangzhihai conceptdriftdetectionmethodformultidimensionaldatastreambasedonclusteringpartition

Concept Drift Detection Method for Multidimensional Data Stream Based on Clustering Partition

Similar Items