An Efficient Method to Accurately Cluster Large Number of High Dimensional Facial Images

Accurately clustering large, high dimensional datasets is a challenging problem in unsupervised learning. K-means is considered to be a fast, widely used and accurate centroid based data partitioning algorithm for spherical datasets. However, its non-determinism and heavy dependence on the selection...

Full description

Bibliographic Details
Main Authors: K. B. Shibu Kumar, Philip Samuel
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10105925/
_version_ 1797835805270474752
author K. B. Shibu Kumar
Philip Samuel
author_facet K. B. Shibu Kumar
Philip Samuel
author_sort K. B. Shibu Kumar
collection DOAJ
description Accurately clustering large, high dimensional datasets is a challenging problem in unsupervised learning. K-means is considered to be a fast, widely used and accurate centroid based data partitioning algorithm for spherical datasets. However, its non-determinism and heavy dependence on the selection of initial cluster centers along with vulnerability to noise make it a poor candidate for clustering large datasets with high dimensionality. To overcome these, we develop a novel, nature inspired, centroid based clustering algorithm, inspired from the principles of particle physics. Our method ensures that the convergence to local optima and non-deterministic outputs are avoided. We experiment the method on large datasets of human face images. Besides, our method addresses the problem of outliers and presence of not well-separated data in these datasets. We use a deep learning model for extracting facial features into a vector of 128 dimensions. We validate the quality and accuracy of our methods using different statistical parameters like f-measure, accuracy, error rate, average in group proportion and normalized cluster size rand index. These evaluations show that our method exhibits better accuracy and quality in clustering large face image datasets, in comparison with other existing mechanisms. The strength of our algorithms is more visible as the size of the dataset grows.
first_indexed 2024-04-09T14:58:21Z
format Article
id doaj.art-33c2a0a65ccb4e5e8133e81e921d0bed
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-09T14:58:21Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-33c2a0a65ccb4e5e8133e81e921d0bed2023-05-01T23:00:57ZengIEEEIEEE Access2169-35362023-01-0111399343994910.1109/ACCESS.2023.326886210105925An Efficient Method to Accurately Cluster Large Number of High Dimensional Facial ImagesK. B. Shibu Kumar0https://orcid.org/0000-0002-3783-0599Philip Samuel1Department of Computer Science and Engineering, College of Engineering Trivandrum, Thiruvananthapuram, IndiaDepartment of Computer Science, Cochin University of Science and Technology, Kochi, IndiaAccurately clustering large, high dimensional datasets is a challenging problem in unsupervised learning. K-means is considered to be a fast, widely used and accurate centroid based data partitioning algorithm for spherical datasets. However, its non-determinism and heavy dependence on the selection of initial cluster centers along with vulnerability to noise make it a poor candidate for clustering large datasets with high dimensionality. To overcome these, we develop a novel, nature inspired, centroid based clustering algorithm, inspired from the principles of particle physics. Our method ensures that the convergence to local optima and non-deterministic outputs are avoided. We experiment the method on large datasets of human face images. Besides, our method addresses the problem of outliers and presence of not well-separated data in these datasets. We use a deep learning model for extracting facial features into a vector of 128 dimensions. We validate the quality and accuracy of our methods using different statistical parameters like f-measure, accuracy, error rate, average in group proportion and normalized cluster size rand index. These evaluations show that our method exhibits better accuracy and quality in clustering large face image datasets, in comparison with other existing mechanisms. The strength of our algorithms is more visible as the size of the dataset grows.https://ieeexplore.ieee.org/document/10105925/Centroid based celestial clusteringface clustering on large datasetsoptimization on clusteringrefined celestial PSO clustering
spellingShingle K. B. Shibu Kumar
Philip Samuel
An Efficient Method to Accurately Cluster Large Number of High Dimensional Facial Images
IEEE Access
Centroid based celestial clustering
face clustering on large datasets
optimization on clustering
refined celestial PSO clustering
title An Efficient Method to Accurately Cluster Large Number of High Dimensional Facial Images
title_full An Efficient Method to Accurately Cluster Large Number of High Dimensional Facial Images
title_fullStr An Efficient Method to Accurately Cluster Large Number of High Dimensional Facial Images
title_full_unstemmed An Efficient Method to Accurately Cluster Large Number of High Dimensional Facial Images
title_short An Efficient Method to Accurately Cluster Large Number of High Dimensional Facial Images
title_sort efficient method to accurately cluster large number of high dimensional facial images
topic Centroid based celestial clustering
face clustering on large datasets
optimization on clustering
refined celestial PSO clustering
url https://ieeexplore.ieee.org/document/10105925/
work_keys_str_mv AT kbshibukumar anefficientmethodtoaccuratelyclusterlargenumberofhighdimensionalfacialimages
AT philipsamuel anefficientmethodtoaccuratelyclusterlargenumberofhighdimensionalfacialimages
AT kbshibukumar efficientmethodtoaccuratelyclusterlargenumberofhighdimensionalfacialimages
AT philipsamuel efficientmethodtoaccuratelyclusterlargenumberofhighdimensionalfacialimages