Enhanced Firefly-K-Means Clustering with Adaptive Mutation and Central Limit Theorem for Automatic Clustering of High-Dimensional Datasets

Metaheuristic algorithms have been hybridized with the standard K-means to address the latter’s challenges in finding a solution to automatic clustering problems. However, the distance calculations required in the standard K-means phase of the hybrid clustering algorithms increase as the number of c...

Full description

Bibliographic Details
Main Authors:	Abiodun M. Ikotun, Absalom E. Ezugwu
Format:	Article
Language:	English
Published:	MDPI AG 2022-11-01
Series:	Applied Sciences
Subjects:	clustering algorithms metaheuristic algorithms hybrid clustering K-means firefly algorithms central limit theorem
Online Access:	https://www.mdpi.com/2076-3417/12/23/12275

_version_	1797463589115658240
author	Abiodun M. Ikotun Absalom E. Ezugwu
author_facet	Abiodun M. Ikotun Absalom E. Ezugwu
author_sort	Abiodun M. Ikotun
collection	DOAJ
description	Metaheuristic algorithms have been hybridized with the standard K-means to address the latter’s challenges in finding a solution to automatic clustering problems. However, the distance calculations required in the standard K-means phase of the hybrid clustering algorithms increase as the number of clusters increases, and the associated computational cost rises in proportion to the dataset dimensionality. The use of the standard K-means algorithm in the metaheuristic-based K-means hybrid algorithm for the automatic clustering of high-dimensional real-world datasets poses a great challenge to the clustering performance of the resultant hybrid algorithms in terms of computational cost. Reducing the computation time required in the K-means phase of the hybrid algorithm for the automatic clustering of high-dimensional datasets will inevitably reduce the algorithm’s complexity. In this paper, a preprocessing phase is introduced into the K-means phase of an improved firefly-based K-means hybrid algorithm using the concept of the central limit theorem to partition the high-dimensional dataset into subgroups of randomly formed subsets on which the K-means algorithm is applied to obtain representative cluster centers for the final clustering procedure. The enhanced firefly algorithm (FA) is hybridized with the CLT-based K-means algorithm to automatically determine the optimum number of cluster centroids and generate corresponding optimum initial cluster centroids for the K-means algorithm to achieve optimal global convergence. Twenty high-dimensional datasets from the UCI machine learning repository are used to investigate the performance of the proposed algorithm. The empirical results indicate that the hybrid FA-K-means clustering method demonstrates statistically significant superiority in the employed performance measures and reducing computation time cost for clustering high-dimensional dataset problems, compared to other advanced hybrid search variants.
first_indexed	2024-03-09T17:52:53Z
format	Article
id	doaj.art-037cb6727de14051addfe03739d0cbc2
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-09T17:52:53Z
publishDate	2022-11-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-037cb6727de14051addfe03739d0cbc22023-11-24T10:33:51ZengMDPI AGApplied Sciences2076-34172022-11-0112231227510.3390/app122312275Enhanced Firefly-K-Means Clustering with Adaptive Mutation and Central Limit Theorem for Automatic Clustering of High-Dimensional DatasetsAbiodun M. Ikotun0Absalom E. Ezugwu1School of Mathematics, Statistics, and Computer Science, Pietermaritzburg Campus, University of KwaZulu-Natal, King Edward Avenue, Pietermaritzburg 3201, South AfricaSchool of Mathematics, Statistics, and Computer Science, Pietermaritzburg Campus, University of KwaZulu-Natal, King Edward Avenue, Pietermaritzburg 3201, South AfricaMetaheuristic algorithms have been hybridized with the standard K-means to address the latter’s challenges in finding a solution to automatic clustering problems. However, the distance calculations required in the standard K-means phase of the hybrid clustering algorithms increase as the number of clusters increases, and the associated computational cost rises in proportion to the dataset dimensionality. The use of the standard K-means algorithm in the metaheuristic-based K-means hybrid algorithm for the automatic clustering of high-dimensional real-world datasets poses a great challenge to the clustering performance of the resultant hybrid algorithms in terms of computational cost. Reducing the computation time required in the K-means phase of the hybrid algorithm for the automatic clustering of high-dimensional datasets will inevitably reduce the algorithm’s complexity. In this paper, a preprocessing phase is introduced into the K-means phase of an improved firefly-based K-means hybrid algorithm using the concept of the central limit theorem to partition the high-dimensional dataset into subgroups of randomly formed subsets on which the K-means algorithm is applied to obtain representative cluster centers for the final clustering procedure. The enhanced firefly algorithm (FA) is hybridized with the CLT-based K-means algorithm to automatically determine the optimum number of cluster centroids and generate corresponding optimum initial cluster centroids for the K-means algorithm to achieve optimal global convergence. Twenty high-dimensional datasets from the UCI machine learning repository are used to investigate the performance of the proposed algorithm. The empirical results indicate that the hybrid FA-K-means clustering method demonstrates statistically significant superiority in the employed performance measures and reducing computation time cost for clustering high-dimensional dataset problems, compared to other advanced hybrid search variants.https://www.mdpi.com/2076-3417/12/23/12275clustering algorithmsmetaheuristic algorithmshybrid clusteringK-meansfirefly algorithmscentral limit theorem
spellingShingle	Abiodun M. Ikotun Absalom E. Ezugwu Enhanced Firefly-K-Means Clustering with Adaptive Mutation and Central Limit Theorem for Automatic Clustering of High-Dimensional Datasets Applied Sciences clustering algorithms metaheuristic algorithms hybrid clustering K-means firefly algorithms central limit theorem
title	Enhanced Firefly-K-Means Clustering with Adaptive Mutation and Central Limit Theorem for Automatic Clustering of High-Dimensional Datasets
title_full	Enhanced Firefly-K-Means Clustering with Adaptive Mutation and Central Limit Theorem for Automatic Clustering of High-Dimensional Datasets
title_fullStr	Enhanced Firefly-K-Means Clustering with Adaptive Mutation and Central Limit Theorem for Automatic Clustering of High-Dimensional Datasets
title_full_unstemmed	Enhanced Firefly-K-Means Clustering with Adaptive Mutation and Central Limit Theorem for Automatic Clustering of High-Dimensional Datasets
title_short	Enhanced Firefly-K-Means Clustering with Adaptive Mutation and Central Limit Theorem for Automatic Clustering of High-Dimensional Datasets
title_sort	enhanced firefly k means clustering with adaptive mutation and central limit theorem for automatic clustering of high dimensional datasets
topic	clustering algorithms metaheuristic algorithms hybrid clustering K-means firefly algorithms central limit theorem
url	https://www.mdpi.com/2076-3417/12/23/12275
work_keys_str_mv	AT abiodunmikotun enhancedfireflykmeansclusteringwithadaptivemutationandcentrallimittheoremforautomaticclusteringofhighdimensionaldatasets AT absalomeezugwu enhancedfireflykmeansclusteringwithadaptivemutationandcentrallimittheoremforautomaticclusteringofhighdimensionaldatasets

Enhanced Firefly-K-Means Clustering with Adaptive Mutation and Central Limit Theorem for Automatic Clustering of High-Dimensional Datasets

Similar Items