An Efficient Method to Accurately Cluster Large Number of High Dimensional Facial Images
Accurately clustering large, high dimensional datasets is a challenging problem in unsupervised learning. K-means is considered to be a fast, widely used and accurate centroid based data partitioning algorithm for spherical datasets. However, its non-determinism and heavy dependence on the selection...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2023-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10105925/ |
_version_ | 1797835805270474752 |
---|---|
author | K. B. Shibu Kumar Philip Samuel |
author_facet | K. B. Shibu Kumar Philip Samuel |
author_sort | K. B. Shibu Kumar |
collection | DOAJ |
description | Accurately clustering large, high dimensional datasets is a challenging problem in unsupervised learning. K-means is considered to be a fast, widely used and accurate centroid based data partitioning algorithm for spherical datasets. However, its non-determinism and heavy dependence on the selection of initial cluster centers along with vulnerability to noise make it a poor candidate for clustering large datasets with high dimensionality. To overcome these, we develop a novel, nature inspired, centroid based clustering algorithm, inspired from the principles of particle physics. Our method ensures that the convergence to local optima and non-deterministic outputs are avoided. We experiment the method on large datasets of human face images. Besides, our method addresses the problem of outliers and presence of not well-separated data in these datasets. We use a deep learning model for extracting facial features into a vector of 128 dimensions. We validate the quality and accuracy of our methods using different statistical parameters like f-measure, accuracy, error rate, average in group proportion and normalized cluster size rand index. These evaluations show that our method exhibits better accuracy and quality in clustering large face image datasets, in comparison with other existing mechanisms. The strength of our algorithms is more visible as the size of the dataset grows. |
first_indexed | 2024-04-09T14:58:21Z |
format | Article |
id | doaj.art-33c2a0a65ccb4e5e8133e81e921d0bed |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-04-09T14:58:21Z |
publishDate | 2023-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-33c2a0a65ccb4e5e8133e81e921d0bed2023-05-01T23:00:57ZengIEEEIEEE Access2169-35362023-01-0111399343994910.1109/ACCESS.2023.326886210105925An Efficient Method to Accurately Cluster Large Number of High Dimensional Facial ImagesK. B. Shibu Kumar0https://orcid.org/0000-0002-3783-0599Philip Samuel1Department of Computer Science and Engineering, College of Engineering Trivandrum, Thiruvananthapuram, IndiaDepartment of Computer Science, Cochin University of Science and Technology, Kochi, IndiaAccurately clustering large, high dimensional datasets is a challenging problem in unsupervised learning. K-means is considered to be a fast, widely used and accurate centroid based data partitioning algorithm for spherical datasets. However, its non-determinism and heavy dependence on the selection of initial cluster centers along with vulnerability to noise make it a poor candidate for clustering large datasets with high dimensionality. To overcome these, we develop a novel, nature inspired, centroid based clustering algorithm, inspired from the principles of particle physics. Our method ensures that the convergence to local optima and non-deterministic outputs are avoided. We experiment the method on large datasets of human face images. Besides, our method addresses the problem of outliers and presence of not well-separated data in these datasets. We use a deep learning model for extracting facial features into a vector of 128 dimensions. We validate the quality and accuracy of our methods using different statistical parameters like f-measure, accuracy, error rate, average in group proportion and normalized cluster size rand index. These evaluations show that our method exhibits better accuracy and quality in clustering large face image datasets, in comparison with other existing mechanisms. The strength of our algorithms is more visible as the size of the dataset grows.https://ieeexplore.ieee.org/document/10105925/Centroid based celestial clusteringface clustering on large datasetsoptimization on clusteringrefined celestial PSO clustering |
spellingShingle | K. B. Shibu Kumar Philip Samuel An Efficient Method to Accurately Cluster Large Number of High Dimensional Facial Images IEEE Access Centroid based celestial clustering face clustering on large datasets optimization on clustering refined celestial PSO clustering |
title | An Efficient Method to Accurately Cluster Large Number of High Dimensional Facial Images |
title_full | An Efficient Method to Accurately Cluster Large Number of High Dimensional Facial Images |
title_fullStr | An Efficient Method to Accurately Cluster Large Number of High Dimensional Facial Images |
title_full_unstemmed | An Efficient Method to Accurately Cluster Large Number of High Dimensional Facial Images |
title_short | An Efficient Method to Accurately Cluster Large Number of High Dimensional Facial Images |
title_sort | efficient method to accurately cluster large number of high dimensional facial images |
topic | Centroid based celestial clustering face clustering on large datasets optimization on clustering refined celestial PSO clustering |
url | https://ieeexplore.ieee.org/document/10105925/ |
work_keys_str_mv | AT kbshibukumar anefficientmethodtoaccuratelyclusterlargenumberofhighdimensionalfacialimages AT philipsamuel anefficientmethodtoaccuratelyclusterlargenumberofhighdimensionalfacialimages AT kbshibukumar efficientmethodtoaccuratelyclusterlargenumberofhighdimensionalfacialimages AT philipsamuel efficientmethodtoaccuratelyclusterlargenumberofhighdimensionalfacialimages |