Model Selection Using <i>K</i>-Means Clustering Algorithm for the Symmetrical Segmentation of Remote Sensing Datasets

The importance of unsupervised clustering methods is well established in the statistics and machine learning literature. Many sophisticated unsupervised classification techniques have been made available to deal with a growing number of datasets. Due to its simplicity and efficiency in clustering a...

Full description

Bibliographic Details
Main Authors: Ishfaq Ali, Atiq Ur Rehman, Dost Muhammad Khan, Zardad Khan, Muhammad Shafiq, Jin-Ghoo Choi
Format: Article
Language:English
Published: MDPI AG 2022-06-01
Series:Symmetry
Subjects:
Online Access:https://www.mdpi.com/2073-8994/14/6/1149
_version_ 1797481990658719744
author Ishfaq Ali
Atiq Ur Rehman
Dost Muhammad Khan
Zardad Khan
Muhammad Shafiq
Jin-Ghoo Choi
author_facet Ishfaq Ali
Atiq Ur Rehman
Dost Muhammad Khan
Zardad Khan
Muhammad Shafiq
Jin-Ghoo Choi
author_sort Ishfaq Ali
collection DOAJ
description The importance of unsupervised clustering methods is well established in the statistics and machine learning literature. Many sophisticated unsupervised classification techniques have been made available to deal with a growing number of datasets. Due to its simplicity and efficiency in clustering a large dataset, the <i>k</i>-means clustering algorithm is still popular and widely used in the machine learning community. However, as with other clustering methods, it requires one to choose the balanced number of clusters in advance. This paper’s primary emphasis is to develop a novel method for finding the optimum number of clusters, <i>k</i>, using a data-driven approach. Taking into account the cluster symmetry property, the <i>k</i>-means algorithm is applied multiple times to a range of <i>k</i> values within which the balanced optimum <i>k</i> value is expected. This is based on the uniqueness and symmetrical nature among the centroid values for the clusters produced, and we chose the final <i>k</i> value as the one for which symmetry is observed. We evaluated the proposed algorithm’s performance on different simulated datasets with controlled parameters and also on real datasets taken from the UCI machine learning repository. We also evaluated the performance of the proposed method with the aim of remote sensing, such as in deforestation and urbanization, using satellite images of the Islamabad region in Pakistan, taken from the Sentinel-2B satellite of the United States Geological Survey. From the experimental results and real data analysis, it is concluded that the proposed algorithm has better accuracy and minimum root mean square error than the existing methods.
first_indexed 2024-03-09T22:21:57Z
format Article
id doaj.art-3131f96569414f97a223f7764e6605dd
institution Directory Open Access Journal
issn 2073-8994
language English
last_indexed 2024-03-09T22:21:57Z
publishDate 2022-06-01
publisher MDPI AG
record_format Article
series Symmetry
spelling doaj.art-3131f96569414f97a223f7764e6605dd2023-11-23T19:11:37ZengMDPI AGSymmetry2073-89942022-06-01146114910.3390/sym14061149Model Selection Using <i>K</i>-Means Clustering Algorithm for the Symmetrical Segmentation of Remote Sensing DatasetsIshfaq Ali0Atiq Ur Rehman1Dost Muhammad Khan2Zardad Khan3Muhammad Shafiq4Jin-Ghoo Choi5Department of Statistics, Abdul Wali Khan University, Mardan 23200, PakistanDepartment of Mathematics and Statistics, Faculty of Basic and Applied Sciences, International Islamic University, Islamabad 44000, PakistanDepartment of Statistics, Abdul Wali Khan University, Mardan 23200, PakistanDepartment of Statistics, Abdul Wali Khan University, Mardan 23200, PakistanDepartment of Information and Communication Engineering, Yeungnam University, Gyeongsan 38541, KoreaDepartment of Information and Communication Engineering, Yeungnam University, Gyeongsan 38541, KoreaThe importance of unsupervised clustering methods is well established in the statistics and machine learning literature. Many sophisticated unsupervised classification techniques have been made available to deal with a growing number of datasets. Due to its simplicity and efficiency in clustering a large dataset, the <i>k</i>-means clustering algorithm is still popular and widely used in the machine learning community. However, as with other clustering methods, it requires one to choose the balanced number of clusters in advance. This paper’s primary emphasis is to develop a novel method for finding the optimum number of clusters, <i>k</i>, using a data-driven approach. Taking into account the cluster symmetry property, the <i>k</i>-means algorithm is applied multiple times to a range of <i>k</i> values within which the balanced optimum <i>k</i> value is expected. This is based on the uniqueness and symmetrical nature among the centroid values for the clusters produced, and we chose the final <i>k</i> value as the one for which symmetry is observed. We evaluated the proposed algorithm’s performance on different simulated datasets with controlled parameters and also on real datasets taken from the UCI machine learning repository. We also evaluated the performance of the proposed method with the aim of remote sensing, such as in deforestation and urbanization, using satellite images of the Islamabad region in Pakistan, taken from the Sentinel-2B satellite of the United States Geological Survey. From the experimental results and real data analysis, it is concluded that the proposed algorithm has better accuracy and minimum root mean square error than the existing methods.https://www.mdpi.com/2073-8994/14/6/1149unsupervised clustering<i>k</i>-meansbalanced optimal number of clusterssymmetryclustering validity indicesremote sensing
spellingShingle Ishfaq Ali
Atiq Ur Rehman
Dost Muhammad Khan
Zardad Khan
Muhammad Shafiq
Jin-Ghoo Choi
Model Selection Using <i>K</i>-Means Clustering Algorithm for the Symmetrical Segmentation of Remote Sensing Datasets
Symmetry
unsupervised clustering
<i>k</i>-means
balanced optimal number of clusters
symmetry
clustering validity indices
remote sensing
title Model Selection Using <i>K</i>-Means Clustering Algorithm for the Symmetrical Segmentation of Remote Sensing Datasets
title_full Model Selection Using <i>K</i>-Means Clustering Algorithm for the Symmetrical Segmentation of Remote Sensing Datasets
title_fullStr Model Selection Using <i>K</i>-Means Clustering Algorithm for the Symmetrical Segmentation of Remote Sensing Datasets
title_full_unstemmed Model Selection Using <i>K</i>-Means Clustering Algorithm for the Symmetrical Segmentation of Remote Sensing Datasets
title_short Model Selection Using <i>K</i>-Means Clustering Algorithm for the Symmetrical Segmentation of Remote Sensing Datasets
title_sort model selection using i k i means clustering algorithm for the symmetrical segmentation of remote sensing datasets
topic unsupervised clustering
<i>k</i>-means
balanced optimal number of clusters
symmetry
clustering validity indices
remote sensing
url https://www.mdpi.com/2073-8994/14/6/1149
work_keys_str_mv AT ishfaqali modelselectionusingikimeansclusteringalgorithmforthesymmetricalsegmentationofremotesensingdatasets
AT atiqurrehman modelselectionusingikimeansclusteringalgorithmforthesymmetricalsegmentationofremotesensingdatasets
AT dostmuhammadkhan modelselectionusingikimeansclusteringalgorithmforthesymmetricalsegmentationofremotesensingdatasets
AT zardadkhan modelselectionusingikimeansclusteringalgorithmforthesymmetricalsegmentationofremotesensingdatasets
AT muhammadshafiq modelselectionusingikimeansclusteringalgorithmforthesymmetricalsegmentationofremotesensingdatasets
AT jinghoochoi modelselectionusingikimeansclusteringalgorithmforthesymmetricalsegmentationofremotesensingdatasets