Introduction to stream: An Extensible Framework for Data Stream Clustering Research with R

In recent years, data streams have become an increasingly important area of research for the computer science, database and statistics communities. Data streams are ordered and potentially unbounded sequences of data points created by a typically non-stationary data generating process. Common data m...

Full description

Bibliographic Details
Main Authors:	Michael Hahsler, Matthew Bolaños, John Forrest
Format:	Article
Language:	English
Published:	Foundation for Open Access Statistics 2017-02-01
Series:	Journal of Statistical Software
Subjects:	data streams data mining clustering
Online Access:	https://www.jstatsoft.org/index.php/jss/article/view/3047

_version_	1817993104700997632
author	Michael Hahsler Matthew Bolaños John Forrest
author_facet	Michael Hahsler Matthew Bolaños John Forrest
author_sort	Michael Hahsler
collection	DOAJ
description	In recent years, data streams have become an increasingly important area of research for the computer science, database and statistics communities. Data streams are ordered and potentially unbounded sequences of data points created by a typically non-stationary data generating process. Common data mining tasks associated with data streams include clustering, classification and frequent pattern mining. New algorithms for these types of data are proposed regularly and it is important to evaluate them thoroughly under standardized conditions. In this paper we introduce stream, a research tool that includes modeling and simulating data streams as well as an extensible framework for implementing, interfacing and experimenting with algorithms for various data stream mining tasks. The main advantage of stream is that it seamlessly integrates with the large existing infrastructure provided by R. In addition to data handling, plotting and easy scripting capabilities, R also provides many existing algorithms and enables users to interface code written in many programming languages popular among data mining researchers (e.g., C/C++, Java and Python). In this paper we describe the architecture of stream and focus on its use for data stream clustering research. stream was implemented with extensibility in mind and will be extended in the future to cover additional data stream mining tasks like classification and frequent pattern mining.
first_indexed	2024-04-14T01:35:21Z
format	Article
id	doaj.art-04343148f8734bb2bef89caf16fe1727
institution	Directory Open Access Journal
issn	1548-7660
language	English
last_indexed	2024-04-14T01:35:21Z
publishDate	2017-02-01
publisher	Foundation for Open Access Statistics
record_format	Article
series	Journal of Statistical Software
spelling	doaj.art-04343148f8734bb2bef89caf16fe17272022-12-22T02:19:59ZengFoundation for Open Access StatisticsJournal of Statistical Software1548-76602017-02-0176115010.18637/jss.v076.i141090Introduction to stream: An Extensible Framework for Data Stream Clustering Research with RMichael HahslerMatthew BolañosJohn ForrestIn recent years, data streams have become an increasingly important area of research for the computer science, database and statistics communities. Data streams are ordered and potentially unbounded sequences of data points created by a typically non-stationary data generating process. Common data mining tasks associated with data streams include clustering, classification and frequent pattern mining. New algorithms for these types of data are proposed regularly and it is important to evaluate them thoroughly under standardized conditions. In this paper we introduce stream, a research tool that includes modeling and simulating data streams as well as an extensible framework for implementing, interfacing and experimenting with algorithms for various data stream mining tasks. The main advantage of stream is that it seamlessly integrates with the large existing infrastructure provided by R. In addition to data handling, plotting and easy scripting capabilities, R also provides many existing algorithms and enables users to interface code written in many programming languages popular among data mining researchers (e.g., C/C++, Java and Python). In this paper we describe the architecture of stream and focus on its use for data stream clustering research. stream was implemented with extensibility in mind and will be extended in the future to cover additional data stream mining tasks like classification and frequent pattern mining.https://www.jstatsoft.org/index.php/jss/article/view/3047data streamsdata miningclustering
spellingShingle	Michael Hahsler Matthew Bolaños John Forrest Introduction to stream: An Extensible Framework for Data Stream Clustering Research with R Journal of Statistical Software data streams data mining clustering
title	Introduction to stream: An Extensible Framework for Data Stream Clustering Research with R
title_full	Introduction to stream: An Extensible Framework for Data Stream Clustering Research with R
title_fullStr	Introduction to stream: An Extensible Framework for Data Stream Clustering Research with R
title_full_unstemmed	Introduction to stream: An Extensible Framework for Data Stream Clustering Research with R
title_short	Introduction to stream: An Extensible Framework for Data Stream Clustering Research with R
title_sort	introduction to stream an extensible framework for data stream clustering research with r
topic	data streams data mining clustering
url	https://www.jstatsoft.org/index.php/jss/article/view/3047
work_keys_str_mv	AT michaelhahsler introductiontostreamanextensibleframeworkfordatastreamclusteringresearchwithr AT matthewbolanos introductiontostreamanextensibleframeworkfordatastreamclusteringresearchwithr AT johnforrest introductiontostreamanextensibleframeworkfordatastreamclusteringresearchwithr

Introduction to stream: An Extensible Framework for Data Stream Clustering Research with R

Similar Items