Introduction to stream: An Extensible Framework for Data Stream Clustering Research with R

In recent years, data streams have become an increasingly important area of research for the computer science, database and statistics communities. Data streams are ordered and potentially unbounded sequences of data points created by a typically non-stationary data generating process. Common data m...

Full description

Bibliographic Details
Main Authors: Michael Hahsler, Matthew Bolaños, John Forrest
Format: Article
Language:English
Published: Foundation for Open Access Statistics 2017-02-01
Series:Journal of Statistical Software
Subjects:
Online Access:https://www.jstatsoft.org/index.php/jss/article/view/3047
_version_ 1817993104700997632
author Michael Hahsler
Matthew Bolaños
John Forrest
author_facet Michael Hahsler
Matthew Bolaños
John Forrest
author_sort Michael Hahsler
collection DOAJ
description In recent years, data streams have become an increasingly important area of research for the computer science, database and statistics communities. Data streams are ordered and potentially unbounded sequences of data points created by a typically non-stationary data generating process. Common data mining tasks associated with data streams include clustering, classification and frequent pattern mining. New algorithms for these types of data are proposed regularly and it is important to evaluate them thoroughly under standardized conditions. In this paper we introduce stream, a research tool that includes modeling and simulating data streams as well as an extensible framework for implementing, interfacing and experimenting with algorithms for various data stream mining tasks. The main advantage of stream is that it seamlessly integrates with the large existing infrastructure provided by R. In addition to data handling, plotting and easy scripting capabilities, R also provides many existing algorithms and enables users to interface code written in many programming languages popular among data mining researchers (e.g., C/C++, Java and Python). In this paper we describe the architecture of stream and focus on its use for data stream clustering research. stream was implemented with extensibility in mind and will be extended in the future to cover additional data stream mining tasks like classification and frequent pattern mining.
first_indexed 2024-04-14T01:35:21Z
format Article
id doaj.art-04343148f8734bb2bef89caf16fe1727
institution Directory Open Access Journal
issn 1548-7660
language English
last_indexed 2024-04-14T01:35:21Z
publishDate 2017-02-01
publisher Foundation for Open Access Statistics
record_format Article
series Journal of Statistical Software
spelling doaj.art-04343148f8734bb2bef89caf16fe17272022-12-22T02:19:59ZengFoundation for Open Access StatisticsJournal of Statistical Software1548-76602017-02-0176115010.18637/jss.v076.i141090Introduction to stream: An Extensible Framework for Data Stream Clustering Research with RMichael HahslerMatthew BolañosJohn ForrestIn recent years, data streams have become an increasingly important area of research for the computer science, database and statistics communities. Data streams are ordered and potentially unbounded sequences of data points created by a typically non-stationary data generating process. Common data mining tasks associated with data streams include clustering, classification and frequent pattern mining. New algorithms for these types of data are proposed regularly and it is important to evaluate them thoroughly under standardized conditions. In this paper we introduce stream, a research tool that includes modeling and simulating data streams as well as an extensible framework for implementing, interfacing and experimenting with algorithms for various data stream mining tasks. The main advantage of stream is that it seamlessly integrates with the large existing infrastructure provided by R. In addition to data handling, plotting and easy scripting capabilities, R also provides many existing algorithms and enables users to interface code written in many programming languages popular among data mining researchers (e.g., C/C++, Java and Python). In this paper we describe the architecture of stream and focus on its use for data stream clustering research. stream was implemented with extensibility in mind and will be extended in the future to cover additional data stream mining tasks like classification and frequent pattern mining.https://www.jstatsoft.org/index.php/jss/article/view/3047data streamsdata miningclustering
spellingShingle Michael Hahsler
Matthew Bolaños
John Forrest
Introduction to stream: An Extensible Framework for Data Stream Clustering Research with R
Journal of Statistical Software
data streams
data mining
clustering
title Introduction to stream: An Extensible Framework for Data Stream Clustering Research with R
title_full Introduction to stream: An Extensible Framework for Data Stream Clustering Research with R
title_fullStr Introduction to stream: An Extensible Framework for Data Stream Clustering Research with R
title_full_unstemmed Introduction to stream: An Extensible Framework for Data Stream Clustering Research with R
title_short Introduction to stream: An Extensible Framework for Data Stream Clustering Research with R
title_sort introduction to stream an extensible framework for data stream clustering research with r
topic data streams
data mining
clustering
url https://www.jstatsoft.org/index.php/jss/article/view/3047
work_keys_str_mv AT michaelhahsler introductiontostreamanextensibleframeworkfordatastreamclusteringresearchwithr
AT matthewbolanos introductiontostreamanextensibleframeworkfordatastreamclusteringresearchwithr
AT johnforrest introductiontostreamanextensibleframeworkfordatastreamclusteringresearchwithr