A Novel Method of Clustering Using a Stochastic Approach
The cluster analysis of real-life data often encounters the challenges of noisy data or may rely heavily on the uncertainty of the main clustering variable owing to its stochastic nature, which has a potential influence on its performance. In this paper, we propose a novel clustering technique that...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2022-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9938975/ |
_version_ | 1828099009804238848 |
---|---|
author | Gabiriele Bulivou Karuna G. Reddy M. G. M. Khan |
author_facet | Gabiriele Bulivou Karuna G. Reddy M. G. M. Khan |
author_sort | Gabiriele Bulivou |
collection | DOAJ |
description | The cluster analysis of real-life data often encounters the challenges of noisy data or may rely heavily on the uncertainty of the main clustering variable owing to its stochastic nature, which has a potential influence on its performance. In this paper, we propose a novel clustering technique that is efficient in dealing with noise and uncertainty in a dataset by adopting a stochastic approach that uses realistic values of data points by assuming a continuous probability distribution instead of exact values. By estimating the best-fit probability distribution of the clustering variable, the proposed method formulates the problem of determining the most homogeneous clusters by determining the optimum cluster partitions (OCP) as a mathematical programming problem (MPP). A computer-intensive dynamic programming technique was used to solve the MPP and determine the OCP, which minimized the sum of the weighted intracluster standard deviations. The proposed technique is then demonstrated in this study using univariate data that follows a normal distribution, which is a symmetric distribution, as well as the Weibull distribution, which is a skewed distribution. Numerical examples were also presented to illustrate the computational details of the proposed method. Finally, using both simulated and real datasets, a comparative analysis of the effectiveness of the proposed technique was performed against four advanced clustering methods: k-means, fuzzy c-means, expectation maximization, and Genie++ hierarchical clustering. The results reveal that the proposed method works well and produces more efficient clusters than other methods. |
first_indexed | 2024-04-11T08:09:41Z |
format | Article |
id | doaj.art-e7d8fc4245dd40ada7573683e46743f0 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-04-11T08:09:41Z |
publishDate | 2022-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-e7d8fc4245dd40ada7573683e46743f02022-12-22T04:35:23ZengIEEEIEEE Access2169-35362022-01-011011792511794310.1109/ACCESS.2022.32194579938975A Novel Method of Clustering Using a Stochastic ApproachGabiriele Bulivou0https://orcid.org/0000-0002-8984-6306Karuna G. Reddy1M. G. M. Khan2https://orcid.org/0000-0001-5400-9703School of Information Technology, Engineering, Mathematics and Physics, The University of the South Pacific, Suva, FijiSchool of Information Technology, Engineering, Mathematics and Physics, The University of the South Pacific, Suva, FijiSchool of Information Technology, Engineering, Mathematics and Physics, The University of the South Pacific, Suva, FijiThe cluster analysis of real-life data often encounters the challenges of noisy data or may rely heavily on the uncertainty of the main clustering variable owing to its stochastic nature, which has a potential influence on its performance. In this paper, we propose a novel clustering technique that is efficient in dealing with noise and uncertainty in a dataset by adopting a stochastic approach that uses realistic values of data points by assuming a continuous probability distribution instead of exact values. By estimating the best-fit probability distribution of the clustering variable, the proposed method formulates the problem of determining the most homogeneous clusters by determining the optimum cluster partitions (OCP) as a mathematical programming problem (MPP). A computer-intensive dynamic programming technique was used to solve the MPP and determine the OCP, which minimized the sum of the weighted intracluster standard deviations. The proposed technique is then demonstrated in this study using univariate data that follows a normal distribution, which is a symmetric distribution, as well as the Weibull distribution, which is a skewed distribution. Numerical examples were also presented to illustrate the computational details of the proposed method. Finally, using both simulated and real datasets, a comparative analysis of the effectiveness of the proposed technique was performed against four advanced clustering methods: k-means, fuzzy c-means, expectation maximization, and Genie++ hierarchical clustering. The results reveal that the proposed method works well and produces more efficient clusters than other methods.https://ieeexplore.ieee.org/document/9938975/Clustering methodsdynamic programmingmathematical programmingoptimum cluster partitionsstatistical probability distributions |
spellingShingle | Gabiriele Bulivou Karuna G. Reddy M. G. M. Khan A Novel Method of Clustering Using a Stochastic Approach IEEE Access Clustering methods dynamic programming mathematical programming optimum cluster partitions statistical probability distributions |
title | A Novel Method of Clustering Using a Stochastic Approach |
title_full | A Novel Method of Clustering Using a Stochastic Approach |
title_fullStr | A Novel Method of Clustering Using a Stochastic Approach |
title_full_unstemmed | A Novel Method of Clustering Using a Stochastic Approach |
title_short | A Novel Method of Clustering Using a Stochastic Approach |
title_sort | novel method of clustering using a stochastic approach |
topic | Clustering methods dynamic programming mathematical programming optimum cluster partitions statistical probability distributions |
url | https://ieeexplore.ieee.org/document/9938975/ |
work_keys_str_mv | AT gabirielebulivou anovelmethodofclusteringusingastochasticapproach AT karunagreddy anovelmethodofclusteringusingastochasticapproach AT mgmkhan anovelmethodofclusteringusingastochasticapproach AT gabirielebulivou novelmethodofclusteringusingastochasticapproach AT karunagreddy novelmethodofclusteringusingastochasticapproach AT mgmkhan novelmethodofclusteringusingastochasticapproach |