A Novel Hybrid PSO-K-Means Clustering Algorithm Using Gaussian Estimation of Distribution Method and Lévy Flight

Clustering is an important data analysis technique, which has been applied to many practical scenarios. However, many partitioning based clustering algorithms are sensitive to the initial state of cluster centroids, may get trapped in a local optimum, and have poor robustness. In recent years, parti...

Full description

Bibliographic Details
Main Authors: Hanjie Gao, Yintong Li, Petr Kabalyants, Hao Xu, Rodrigo Martinez-Bejar
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9133522/
Description
Summary:Clustering is an important data analysis technique, which has been applied to many practical scenarios. However, many partitioning based clustering algorithms are sensitive to the initial state of cluster centroids, may get trapped in a local optimum, and have poor robustness. In recent years, particle swarm optimization (PSO) has been regarded as an effective solution to the problem. However, it has the possibility of converging to a local optimum, especially when solving complex problems. In this paper, we propose a hybrid PSO-K-means algorithm, which uses the Gaussian estimation of distribution method (GEDM) to assist PSO in updating the population information and adopts Lévy flight to escape from the local optimum. The proposed algorithm is named a GEDM and Lévy flight based PSO-K-means (GLPSOK) clustering algorithm. Firstly, during initialization, a few particles are initialized using the cluster centroids generated by K-means, while other particles are randomly initialized in the search space. Secondly, GEDM and PSO are selected with different probability to update the population information at different optimization stages. Thirdly, Lévy flight is adopted to help the search escape from the local optimum. Finally, the greedy strategy is carried out to select the promising particles from the parents and the newly generated candidates. Experimental results on both synthetic data sets and real-world data sets show that the proposed algorithm can produce better clustering results and is more robust than existing classic or state-of-the-art clustering algorithms.
ISSN:2169-3536