Concept Drift Detection Method for Multidimensional Data Stream Based on Clustering Partition
The analysis and utilization of potential information in data stream is an important part of data stream mining.Concept drift is a huge challenge for data stream mining that the distribution of data will change with time.Detecting changes in data distribution is a direct and effective method to dete...
Main Author: | |
---|---|
Format: | Article |
Language: | zho |
Published: |
Editorial office of Computer Science
2022-07-01
|
Series: | Jisuanji kexue |
Subjects: | |
Online Access: | https://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2022-49-7-25.pdf |
_version_ | 1797845155314663424 |
---|---|
author | CHEN Yuan-yuan, WANG Zhi-hai |
author_facet | CHEN Yuan-yuan, WANG Zhi-hai |
author_sort | CHEN Yuan-yuan, WANG Zhi-hai |
collection | DOAJ |
description | The analysis and utilization of potential information in data stream is an important part of data stream mining.Concept drift is a huge challenge for data stream mining that the distribution of data will change with time.Detecting changes in data distribution is a direct and effective method to detect concept drift.Currently,some concept drift detection methods use the tree structure or grid to establish a histogram to describe the data distribution.However,the tree structure is easy to produce inspection blind spots and leads to poor interpretability.While using the grid method on multi-dimensional data,the memory consumption is too much.To solve the above problems,a concept drift detection method for multi-dimensional data streams called partition based on uniform density clusters(PUDC) is proposed.The algorithm is based on the <i>k</i>-Means algorithm to partition the data with uniform density and uses the chi-square test for statistics and calculation of each partition to detect the concept drift.To ve-rify the validity of the method,four artificial datasets and three real datasets were selected for experiments.The type I and type II error rates of different dimensions of data were compared and analyzed.Experimental results show that PUDC algorithm is superior to several new algorithms in concept drift detection of multi-dimensional data streams. |
first_indexed | 2024-04-09T17:35:18Z |
format | Article |
id | doaj.art-c1964bdc2a4f4ed1836b4518abc243ea |
institution | Directory Open Access Journal |
issn | 1002-137X |
language | zho |
last_indexed | 2024-04-09T17:35:18Z |
publishDate | 2022-07-01 |
publisher | Editorial office of Computer Science |
record_format | Article |
series | Jisuanji kexue |
spelling | doaj.art-c1964bdc2a4f4ed1836b4518abc243ea2023-04-18T02:32:12ZzhoEditorial office of Computer ScienceJisuanji kexue1002-137X2022-07-01497253010.11896/jsjkx.210600155Concept Drift Detection Method for Multidimensional Data Stream Based on Clustering PartitionCHEN Yuan-yuan, WANG Zhi-hai0School of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044,China;Beijing Key Laboratory of Traffic Data Analysis and Mining,Beijing Jiaotong University,Beijing 100044,ChinaThe analysis and utilization of potential information in data stream is an important part of data stream mining.Concept drift is a huge challenge for data stream mining that the distribution of data will change with time.Detecting changes in data distribution is a direct and effective method to detect concept drift.Currently,some concept drift detection methods use the tree structure or grid to establish a histogram to describe the data distribution.However,the tree structure is easy to produce inspection blind spots and leads to poor interpretability.While using the grid method on multi-dimensional data,the memory consumption is too much.To solve the above problems,a concept drift detection method for multi-dimensional data streams called partition based on uniform density clusters(PUDC) is proposed.The algorithm is based on the <i>k</i>-Means algorithm to partition the data with uniform density and uses the chi-square test for statistics and calculation of each partition to detect the concept drift.To ve-rify the validity of the method,four artificial datasets and three real datasets were selected for experiments.The type I and type II error rates of different dimensions of data were compared and analyzed.Experimental results show that PUDC algorithm is superior to several new algorithms in concept drift detection of multi-dimensional data streams.https://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2022-49-7-25.pdfdata stream mining|concept drift detection|<i>k</i>-means|hypothetical test|histogram |
spellingShingle | CHEN Yuan-yuan, WANG Zhi-hai Concept Drift Detection Method for Multidimensional Data Stream Based on Clustering Partition Jisuanji kexue data stream mining|concept drift detection|<i>k</i>-means|hypothetical test|histogram |
title | Concept Drift Detection Method for Multidimensional Data Stream Based on Clustering Partition |
title_full | Concept Drift Detection Method for Multidimensional Data Stream Based on Clustering Partition |
title_fullStr | Concept Drift Detection Method for Multidimensional Data Stream Based on Clustering Partition |
title_full_unstemmed | Concept Drift Detection Method for Multidimensional Data Stream Based on Clustering Partition |
title_short | Concept Drift Detection Method for Multidimensional Data Stream Based on Clustering Partition |
title_sort | concept drift detection method for multidimensional data stream based on clustering partition |
topic | data stream mining|concept drift detection|<i>k</i>-means|hypothetical test|histogram |
url | https://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2022-49-7-25.pdf |
work_keys_str_mv | AT chenyuanyuanwangzhihai conceptdriftdetectionmethodformultidimensionaldatastreambasedonclusteringpartition |