Optimization Algorithms for Scalable Stream Batch Clustering with k Estimation

The increasing volume and velocity of the continuously generated data (data stream) challenge machine learning algorithms, which must evolve to fit real-world problems. The data stream clustering algorithms face issues such as the rapidly increasing volume of the data, the variety of the number of c...

Full description

Bibliographic Details
Main Authors:	Paulo Gustavo Lopes Cândido, Jonathan Andrade Silva, Elaine Ribeiro Faria, Murilo Coelho Naldi
Format:	Article
Language:	English
Published:	MDPI AG 2022-06-01
Series:	Applied Sciences
Subjects:	machine learning clustering data stream massive parallel computation
Online Access:	https://www.mdpi.com/2076-3417/12/13/6464

_version_	1797481042675761152
author	Paulo Gustavo Lopes Cândido Jonathan Andrade Silva Elaine Ribeiro Faria Murilo Coelho Naldi
author_facet	Paulo Gustavo Lopes Cândido Jonathan Andrade Silva Elaine Ribeiro Faria Murilo Coelho Naldi
author_sort	Paulo Gustavo Lopes Cândido
collection	DOAJ
description	The increasing volume and velocity of the continuously generated data (data stream) challenge machine learning algorithms, which must evolve to fit real-world problems. The data stream clustering algorithms face issues such as the rapidly increasing volume of the data, the variety of the number of clusters, and their shapes. The present work aims to improve the accuracy of sequential clustering batches of data streams for scenarios in which clusters evolve dynamically and continuously, automatically estimating their number. In order to achieve this goal, three evolutionary algorithms are presented, along with three novel algorithms designed to deal with clusters of normal distribution based on goodness-of-fit tests in the context of scalable batch stream clustering with automatic estimation of the number of clusters. All of them are developed on top of MapReduce, Discretized-Stream models, and the most recent MPC frameworks to provide scalability, reliability, resilience, and flexibility. The proposed algorithms are experimentally compared with state-of-the-art methods and present the best results for accuracy for normally distributed data sets, reaching their goal.
first_indexed	2024-03-09T22:08:49Z
format	Article
id	doaj.art-cc8ef564322143f59b977dc03886d737
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-09T22:08:49Z
publishDate	2022-06-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-cc8ef564322143f59b977dc03886d7372023-11-23T19:37:05ZengMDPI AGApplied Sciences2076-34172022-06-011213646410.3390/app12136464Optimization Algorithms for Scalable Stream Batch Clustering with k EstimationPaulo Gustavo Lopes Cândido0Jonathan Andrade Silva1Elaine Ribeiro Faria2Murilo Coelho Naldi3Department of Informatics, Federal University of Viçosa, Viçosa 35690-000, MG, BrazilFaculty of Computer Science, Federal University of Mato Grosso do Sul, Campo Grande 79070-900, MS, BrazilFaculty of Computer Science, Federal University of Uberlândia, Uberlânida 38408-100, MG, BrazilDepartment of Computer Science, Federal University of São Carlos, São Carlos 13565-905, SP, BrazilThe increasing volume and velocity of the continuously generated data (data stream) challenge machine learning algorithms, which must evolve to fit real-world problems. The data stream clustering algorithms face issues such as the rapidly increasing volume of the data, the variety of the number of clusters, and their shapes. The present work aims to improve the accuracy of sequential clustering batches of data streams for scenarios in which clusters evolve dynamically and continuously, automatically estimating their number. In order to achieve this goal, three evolutionary algorithms are presented, along with three novel algorithms designed to deal with clusters of normal distribution based on goodness-of-fit tests in the context of scalable batch stream clustering with automatic estimation of the number of clusters. All of them are developed on top of MapReduce, Discretized-Stream models, and the most recent MPC frameworks to provide scalability, reliability, resilience, and flexibility. The proposed algorithms are experimentally compared with state-of-the-art methods and present the best results for accuracy for normally distributed data sets, reaching their goal.https://www.mdpi.com/2076-3417/12/13/6464machine learningclusteringdata streammassive parallel computation
spellingShingle	Paulo Gustavo Lopes Cândido Jonathan Andrade Silva Elaine Ribeiro Faria Murilo Coelho Naldi Optimization Algorithms for Scalable Stream Batch Clustering with k Estimation Applied Sciences machine learning clustering data stream massive parallel computation
title	Optimization Algorithms for Scalable Stream Batch Clustering with k Estimation
title_full	Optimization Algorithms for Scalable Stream Batch Clustering with k Estimation
title_fullStr	Optimization Algorithms for Scalable Stream Batch Clustering with k Estimation
title_full_unstemmed	Optimization Algorithms for Scalable Stream Batch Clustering with k Estimation
title_short	Optimization Algorithms for Scalable Stream Batch Clustering with k Estimation
title_sort	optimization algorithms for scalable stream batch clustering with k estimation
topic	machine learning clustering data stream massive parallel computation
url	https://www.mdpi.com/2076-3417/12/13/6464
work_keys_str_mv	AT paulogustavolopescandido optimizationalgorithmsforscalablestreambatchclusteringwithkestimation AT jonathanandradesilva optimizationalgorithmsforscalablestreambatchclusteringwithkestimation AT elaineribeirofaria optimizationalgorithmsforscalablestreambatchclusteringwithkestimation AT murilocoelhonaldi optimizationalgorithmsforscalablestreambatchclusteringwithkestimation

Optimization Algorithms for Scalable Stream Batch Clustering with k Estimation

Similar Items