Introduction to stream: An Extensible Framework for Data Stream Clustering Research with R
In recent years, data streams have become an increasingly important area of research for the computer science, database and statistics communities. Data streams are ordered and potentially unbounded sequences of data points created by a typically non-stationary data generating process. Common data m...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Foundation for Open Access Statistics
2017-02-01
|
Series: | Journal of Statistical Software |
Subjects: | |
Online Access: | https://www.jstatsoft.org/index.php/jss/article/view/3047 |
_version_ | 1817993104700997632 |
---|---|
author | Michael Hahsler Matthew Bolaños John Forrest |
author_facet | Michael Hahsler Matthew Bolaños John Forrest |
author_sort | Michael Hahsler |
collection | DOAJ |
description | In recent years, data streams have become an increasingly important area of research for the computer science, database and statistics communities. Data streams are ordered and potentially unbounded sequences of data points created by a typically non-stationary data generating process. Common data mining tasks associated with data streams include clustering, classification and frequent pattern mining. New algorithms for these types of data are proposed regularly and it is important to evaluate them thoroughly under standardized conditions. In this paper we introduce stream, a research tool that includes modeling and simulating data streams as well as an extensible framework for implementing, interfacing and experimenting with algorithms for various data stream mining tasks. The main advantage of stream is that it seamlessly integrates with the large existing infrastructure provided by R. In addition to data handling, plotting and easy scripting capabilities, R also provides many existing algorithms and enables users to interface code written in many programming languages popular among data mining researchers (e.g., C/C++, Java and Python). In this paper we describe the architecture of stream and focus on its use for data stream clustering research. stream was implemented with extensibility in mind and will be extended in the future to cover additional data stream mining tasks like classification and frequent pattern mining. |
first_indexed | 2024-04-14T01:35:21Z |
format | Article |
id | doaj.art-04343148f8734bb2bef89caf16fe1727 |
institution | Directory Open Access Journal |
issn | 1548-7660 |
language | English |
last_indexed | 2024-04-14T01:35:21Z |
publishDate | 2017-02-01 |
publisher | Foundation for Open Access Statistics |
record_format | Article |
series | Journal of Statistical Software |
spelling | doaj.art-04343148f8734bb2bef89caf16fe17272022-12-22T02:19:59ZengFoundation for Open Access StatisticsJournal of Statistical Software1548-76602017-02-0176115010.18637/jss.v076.i141090Introduction to stream: An Extensible Framework for Data Stream Clustering Research with RMichael HahslerMatthew BolañosJohn ForrestIn recent years, data streams have become an increasingly important area of research for the computer science, database and statistics communities. Data streams are ordered and potentially unbounded sequences of data points created by a typically non-stationary data generating process. Common data mining tasks associated with data streams include clustering, classification and frequent pattern mining. New algorithms for these types of data are proposed regularly and it is important to evaluate them thoroughly under standardized conditions. In this paper we introduce stream, a research tool that includes modeling and simulating data streams as well as an extensible framework for implementing, interfacing and experimenting with algorithms for various data stream mining tasks. The main advantage of stream is that it seamlessly integrates with the large existing infrastructure provided by R. In addition to data handling, plotting and easy scripting capabilities, R also provides many existing algorithms and enables users to interface code written in many programming languages popular among data mining researchers (e.g., C/C++, Java and Python). In this paper we describe the architecture of stream and focus on its use for data stream clustering research. stream was implemented with extensibility in mind and will be extended in the future to cover additional data stream mining tasks like classification and frequent pattern mining.https://www.jstatsoft.org/index.php/jss/article/view/3047data streamsdata miningclustering |
spellingShingle | Michael Hahsler Matthew Bolaños John Forrest Introduction to stream: An Extensible Framework for Data Stream Clustering Research with R Journal of Statistical Software data streams data mining clustering |
title | Introduction to stream: An Extensible Framework for Data Stream Clustering Research with R |
title_full | Introduction to stream: An Extensible Framework for Data Stream Clustering Research with R |
title_fullStr | Introduction to stream: An Extensible Framework for Data Stream Clustering Research with R |
title_full_unstemmed | Introduction to stream: An Extensible Framework for Data Stream Clustering Research with R |
title_short | Introduction to stream: An Extensible Framework for Data Stream Clustering Research with R |
title_sort | introduction to stream an extensible framework for data stream clustering research with r |
topic | data streams data mining clustering |
url | https://www.jstatsoft.org/index.php/jss/article/view/3047 |
work_keys_str_mv | AT michaelhahsler introductiontostreamanextensibleframeworkfordatastreamclusteringresearchwithr AT matthewbolanos introductiontostreamanextensibleframeworkfordatastreamclusteringresearchwithr AT johnforrest introductiontostreamanextensibleframeworkfordatastreamclusteringresearchwithr |