A Novel Method of Clustering Using a Stochastic Approach

The cluster analysis of real-life data often encounters the challenges of noisy data or may rely heavily on the uncertainty of the main clustering variable owing to its stochastic nature, which has a potential influence on its performance. In this paper, we propose a novel clustering technique that...

Full description

Bibliographic Details
Main Authors: Gabiriele Bulivou, Karuna G. Reddy, M. G. M. Khan
Format: Article
Language:English
Published: IEEE 2022-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9938975/
Description
Summary:The cluster analysis of real-life data often encounters the challenges of noisy data or may rely heavily on the uncertainty of the main clustering variable owing to its stochastic nature, which has a potential influence on its performance. In this paper, we propose a novel clustering technique that is efficient in dealing with noise and uncertainty in a dataset by adopting a stochastic approach that uses realistic values of data points by assuming a continuous probability distribution instead of exact values. By estimating the best-fit probability distribution of the clustering variable, the proposed method formulates the problem of determining the most homogeneous clusters by determining the optimum cluster partitions (OCP) as a mathematical programming problem (MPP). A computer-intensive dynamic programming technique was used to solve the MPP and determine the OCP, which minimized the sum of the weighted intracluster standard deviations. The proposed technique is then demonstrated in this study using univariate data that follows a normal distribution, which is a symmetric distribution, as well as the Weibull distribution, which is a skewed distribution. Numerical examples were also presented to illustrate the computational details of the proposed method. Finally, using both simulated and real datasets, a comparative analysis of the effectiveness of the proposed technique was performed against four advanced clustering methods: k-means, fuzzy c-means, expectation maximization, and Genie++ hierarchical clustering. The results reveal that the proposed method works well and produces more efficient clusters than other methods.
ISSN:2169-3536