The least sample size essential for detecting changes in clustering solutions of streaming datasets.

The clustering analysis approach treats multivariate data tuples as objects and groups them into clusters based on their similarities or dissimilarities within the dataset. However, in modern world, a significant volume of data is continuously generated from diverse sources over time. In these dynam...

Full description

Bibliographic Details
Main Authors: Muhammad Atif, Muhammad Farooq, Mohammad Abiad, Muhammad Shafiq
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2024-01-01
Series:PLoS ONE
Online Access:https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0297355&type=printable
_version_ 1797296168113274880
author Muhammad Atif
Muhammad Farooq
Mohammad Abiad
Muhammad Shafiq
author_facet Muhammad Atif
Muhammad Farooq
Mohammad Abiad
Muhammad Shafiq
author_sort Muhammad Atif
collection DOAJ
description The clustering analysis approach treats multivariate data tuples as objects and groups them into clusters based on their similarities or dissimilarities within the dataset. However, in modern world, a significant volume of data is continuously generated from diverse sources over time. In these dynamic scenarios, the data is not static but continually evolves. Consequently, the interesting patterns and inherent subgroups within the datasets also change and develop over time. The researchers have paid special attention to monitoring changes in cluster solutions of evolving streams. For this matter, several algorithms have been proposed in the literature. However, to date, no study has examined the effect of variability in cluster sizes on the evolution of cluster solutions. Moreover, no guidance is available on determining the impact of cluster sizes on the type of changes they experience in the streams. In the present simulation study using artificial datasets, the evolution of clusters is examined concerning the variability in cluster sizes. The findings are substantial because tracing and monitoring the changes in clustering solutions have a wide range of applications in every field of research. This study determines the minimum sample size required in the clustering of time-stamped datasets.
first_indexed 2024-03-07T21:59:29Z
format Article
id doaj.art-d53d1c50f188481bad686970999b8fdc
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-03-07T21:59:29Z
publishDate 2024-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-d53d1c50f188481bad686970999b8fdc2024-02-24T05:31:44ZengPublic Library of Science (PLoS)PLoS ONE1932-62032024-01-01192e029735510.1371/journal.pone.0297355The least sample size essential for detecting changes in clustering solutions of streaming datasets.Muhammad AtifMuhammad FarooqMohammad AbiadMuhammad ShafiqThe clustering analysis approach treats multivariate data tuples as objects and groups them into clusters based on their similarities or dissimilarities within the dataset. However, in modern world, a significant volume of data is continuously generated from diverse sources over time. In these dynamic scenarios, the data is not static but continually evolves. Consequently, the interesting patterns and inherent subgroups within the datasets also change and develop over time. The researchers have paid special attention to monitoring changes in cluster solutions of evolving streams. For this matter, several algorithms have been proposed in the literature. However, to date, no study has examined the effect of variability in cluster sizes on the evolution of cluster solutions. Moreover, no guidance is available on determining the impact of cluster sizes on the type of changes they experience in the streams. In the present simulation study using artificial datasets, the evolution of clusters is examined concerning the variability in cluster sizes. The findings are substantial because tracing and monitoring the changes in clustering solutions have a wide range of applications in every field of research. This study determines the minimum sample size required in the clustering of time-stamped datasets.https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0297355&type=printable
spellingShingle Muhammad Atif
Muhammad Farooq
Mohammad Abiad
Muhammad Shafiq
The least sample size essential for detecting changes in clustering solutions of streaming datasets.
PLoS ONE
title The least sample size essential for detecting changes in clustering solutions of streaming datasets.
title_full The least sample size essential for detecting changes in clustering solutions of streaming datasets.
title_fullStr The least sample size essential for detecting changes in clustering solutions of streaming datasets.
title_full_unstemmed The least sample size essential for detecting changes in clustering solutions of streaming datasets.
title_short The least sample size essential for detecting changes in clustering solutions of streaming datasets.
title_sort least sample size essential for detecting changes in clustering solutions of streaming datasets
url https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0297355&type=printable
work_keys_str_mv AT muhammadatif theleastsamplesizeessentialfordetectingchangesinclusteringsolutionsofstreamingdatasets
AT muhammadfarooq theleastsamplesizeessentialfordetectingchangesinclusteringsolutionsofstreamingdatasets
AT mohammadabiad theleastsamplesizeessentialfordetectingchangesinclusteringsolutionsofstreamingdatasets
AT muhammadshafiq theleastsamplesizeessentialfordetectingchangesinclusteringsolutionsofstreamingdatasets
AT muhammadatif leastsamplesizeessentialfordetectingchangesinclusteringsolutionsofstreamingdatasets
AT muhammadfarooq leastsamplesizeessentialfordetectingchangesinclusteringsolutionsofstreamingdatasets
AT mohammadabiad leastsamplesizeessentialfordetectingchangesinclusteringsolutionsofstreamingdatasets
AT muhammadshafiq leastsamplesizeessentialfordetectingchangesinclusteringsolutionsofstreamingdatasets