A Novel Method of Clustering Using a Stochastic Approach

The cluster analysis of real-life data often encounters the challenges of noisy data or may rely heavily on the uncertainty of the main clustering variable owing to its stochastic nature, which has a potential influence on its performance. In this paper, we propose a novel clustering technique that...

Full description

Bibliographic Details
Main Authors: Gabiriele Bulivou, Karuna G. Reddy, M. G. M. Khan
Format: Article
Language:English
Published: IEEE 2022-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9938975/
_version_ 1828099009804238848
author Gabiriele Bulivou
Karuna G. Reddy
M. G. M. Khan
author_facet Gabiriele Bulivou
Karuna G. Reddy
M. G. M. Khan
author_sort Gabiriele Bulivou
collection DOAJ
description The cluster analysis of real-life data often encounters the challenges of noisy data or may rely heavily on the uncertainty of the main clustering variable owing to its stochastic nature, which has a potential influence on its performance. In this paper, we propose a novel clustering technique that is efficient in dealing with noise and uncertainty in a dataset by adopting a stochastic approach that uses realistic values of data points by assuming a continuous probability distribution instead of exact values. By estimating the best-fit probability distribution of the clustering variable, the proposed method formulates the problem of determining the most homogeneous clusters by determining the optimum cluster partitions (OCP) as a mathematical programming problem (MPP). A computer-intensive dynamic programming technique was used to solve the MPP and determine the OCP, which minimized the sum of the weighted intracluster standard deviations. The proposed technique is then demonstrated in this study using univariate data that follows a normal distribution, which is a symmetric distribution, as well as the Weibull distribution, which is a skewed distribution. Numerical examples were also presented to illustrate the computational details of the proposed method. Finally, using both simulated and real datasets, a comparative analysis of the effectiveness of the proposed technique was performed against four advanced clustering methods: k-means, fuzzy c-means, expectation maximization, and Genie++ hierarchical clustering. The results reveal that the proposed method works well and produces more efficient clusters than other methods.
first_indexed 2024-04-11T08:09:41Z
format Article
id doaj.art-e7d8fc4245dd40ada7573683e46743f0
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-11T08:09:41Z
publishDate 2022-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-e7d8fc4245dd40ada7573683e46743f02022-12-22T04:35:23ZengIEEEIEEE Access2169-35362022-01-011011792511794310.1109/ACCESS.2022.32194579938975A Novel Method of Clustering Using a Stochastic ApproachGabiriele Bulivou0https://orcid.org/0000-0002-8984-6306Karuna G. Reddy1M. G. M. Khan2https://orcid.org/0000-0001-5400-9703School of Information Technology, Engineering, Mathematics and Physics, The University of the South Pacific, Suva, FijiSchool of Information Technology, Engineering, Mathematics and Physics, The University of the South Pacific, Suva, FijiSchool of Information Technology, Engineering, Mathematics and Physics, The University of the South Pacific, Suva, FijiThe cluster analysis of real-life data often encounters the challenges of noisy data or may rely heavily on the uncertainty of the main clustering variable owing to its stochastic nature, which has a potential influence on its performance. In this paper, we propose a novel clustering technique that is efficient in dealing with noise and uncertainty in a dataset by adopting a stochastic approach that uses realistic values of data points by assuming a continuous probability distribution instead of exact values. By estimating the best-fit probability distribution of the clustering variable, the proposed method formulates the problem of determining the most homogeneous clusters by determining the optimum cluster partitions (OCP) as a mathematical programming problem (MPP). A computer-intensive dynamic programming technique was used to solve the MPP and determine the OCP, which minimized the sum of the weighted intracluster standard deviations. The proposed technique is then demonstrated in this study using univariate data that follows a normal distribution, which is a symmetric distribution, as well as the Weibull distribution, which is a skewed distribution. Numerical examples were also presented to illustrate the computational details of the proposed method. Finally, using both simulated and real datasets, a comparative analysis of the effectiveness of the proposed technique was performed against four advanced clustering methods: k-means, fuzzy c-means, expectation maximization, and Genie++ hierarchical clustering. The results reveal that the proposed method works well and produces more efficient clusters than other methods.https://ieeexplore.ieee.org/document/9938975/Clustering methodsdynamic programmingmathematical programmingoptimum cluster partitionsstatistical probability distributions
spellingShingle Gabiriele Bulivou
Karuna G. Reddy
M. G. M. Khan
A Novel Method of Clustering Using a Stochastic Approach
IEEE Access
Clustering methods
dynamic programming
mathematical programming
optimum cluster partitions
statistical probability distributions
title A Novel Method of Clustering Using a Stochastic Approach
title_full A Novel Method of Clustering Using a Stochastic Approach
title_fullStr A Novel Method of Clustering Using a Stochastic Approach
title_full_unstemmed A Novel Method of Clustering Using a Stochastic Approach
title_short A Novel Method of Clustering Using a Stochastic Approach
title_sort novel method of clustering using a stochastic approach
topic Clustering methods
dynamic programming
mathematical programming
optimum cluster partitions
statistical probability distributions
url https://ieeexplore.ieee.org/document/9938975/
work_keys_str_mv AT gabirielebulivou anovelmethodofclusteringusingastochasticapproach
AT karunagreddy anovelmethodofclusteringusingastochasticapproach
AT mgmkhan anovelmethodofclusteringusingastochasticapproach
AT gabirielebulivou novelmethodofclusteringusingastochasticapproach
AT karunagreddy novelmethodofclusteringusingastochasticapproach
AT mgmkhan novelmethodofclusteringusingastochasticapproach