A systematic density-based clustering method using anchor points

Clustering is an important unsupervised learning method in machine learning and data mining. Many existing clustering methods may still face the challenge in self-identifying clusters with varying shapes, sizes and densities. To devise a more generic clustering method that considers all the aforemen...

Full description

Bibliographic Details
Main Authors: Wang, Yizhang, Wang, Di, Pang, Wei, Miao, Chunyan, Tan, Ah-Hwee, Zhou, You
Other Authors: School of Computer Science and Engineering
Format: Journal Article
Language:English
Published: 2020
Subjects:
Online Access:https://hdl.handle.net/10356/144283
_version_ 1811690974055759872
author Wang, Yizhang
Wang, Di
Pang, Wei
Miao, Chunyan
Tan, Ah-Hwee
Zhou, You
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Wang, Yizhang
Wang, Di
Pang, Wei
Miao, Chunyan
Tan, Ah-Hwee
Zhou, You
author_sort Wang, Yizhang
collection NTU
description Clustering is an important unsupervised learning method in machine learning and data mining. Many existing clustering methods may still face the challenge in self-identifying clusters with varying shapes, sizes and densities. To devise a more generic clustering method that considers all the aforementioned properties of the natural clusters, we propose a novel clustering algorithm named Anchor Points based Clustering (APC). The anchor points in APC are characterized by having a relatively large distance from data points with higher densities. We take anchor points as centers to obtain intermediate clusters, which can divide the whole dataset more appropriately so as to better facilitate further grouping. In essence, based on the analysis of the identified anchor points, the relationship among the corresponding intermediate clusters can be better revealed. In short, the difference in local densities (densities within neighboring data points) of the anchor points characterizes their different properties, that is to say, all the intermediate clusters may fall into one or multiple identified levels with different densities. Finally, based on the properties of anchor points, APC spontaneously chooses the appropriate clustering strategies and reports the final clustering results. To evaluate the performances of APC, we conduct experiments on twelve two-dimensional synthetic datasets and twelve multi-dimensional real-world datasets. Moreover, we also apply APC to the Olivetti Face dataset to further assess its effectiveness in terms of face recognition. All experimental results indicate that APC outperforms four classical methods and two state-of-the-art methods in most cases.
first_indexed 2024-10-01T06:12:31Z
format Journal Article
id ntu-10356/144283
institution Nanyang Technological University
language English
last_indexed 2024-10-01T06:12:31Z
publishDate 2020
record_format dspace
spelling ntu-10356/1442832021-02-03T05:20:23Z A systematic density-based clustering method using anchor points Wang, Yizhang Wang, Di Pang, Wei Miao, Chunyan Tan, Ah-Hwee Zhou, You School of Computer Science and Engineering Engineering::Computer science and engineering Density Based Clustering Anchor Data Points Clustering is an important unsupervised learning method in machine learning and data mining. Many existing clustering methods may still face the challenge in self-identifying clusters with varying shapes, sizes and densities. To devise a more generic clustering method that considers all the aforementioned properties of the natural clusters, we propose a novel clustering algorithm named Anchor Points based Clustering (APC). The anchor points in APC are characterized by having a relatively large distance from data points with higher densities. We take anchor points as centers to obtain intermediate clusters, which can divide the whole dataset more appropriately so as to better facilitate further grouping. In essence, based on the analysis of the identified anchor points, the relationship among the corresponding intermediate clusters can be better revealed. In short, the difference in local densities (densities within neighboring data points) of the anchor points characterizes their different properties, that is to say, all the intermediate clusters may fall into one or multiple identified levels with different densities. Finally, based on the properties of anchor points, APC spontaneously chooses the appropriate clustering strategies and reports the final clustering results. To evaluate the performances of APC, we conduct experiments on twelve two-dimensional synthetic datasets and twelve multi-dimensional real-world datasets. Moreover, we also apply APC to the Olivetti Face dataset to further assess its effectiveness in terms of face recognition. All experimental results indicate that APC outperforms four classical methods and two state-of-the-art methods in most cases. AI Singapore Ministry of Health (MOH) National Research Foundation (NRF) Accepted version This research is supported by the National Natural Science Foundation of China (61772227,61572227), the Science & Technology Development Founda- tion of Jilin Province (20180201045GX) and the Social Science Foundation of Education Department of Jilin Province (JJKH20181315SK). This research is also supported, in part, by the National Research Foundation Singapore under its AI Singapore Programme (Award Number: AISG-GC-2019-003), the Singapore Ministry of Health under its National Innovation Challenge on Active and Confident Ageing (NIC Project No. MOH/NIC/COG04/2017), and the Joint NTU-WeBank Research Centre on Fintech, Nanyang Technological University, Singapore. 2020-10-27T01:19:16Z 2020-10-27T01:19:16Z 2020 Journal Article Wang, Y., Wang, D., Pang, W., Miao, C., Tan, A.-H., & Zhou, Y. (2020). A systematic density-based clustering method using anchor points. Neurocomputing, 400, 352-370. doi:10.1016/j.neucom.2020.02.119 0925-2312 https://hdl.handle.net/10356/144283 10.1016/j.neucom.2020.02.119 400 352 370 en Neurocomputing © 2020 Elsevier B.V. All rights reserved. This paper was published in Neurocomputing and is made available with permission of Elsevier B.V. application/pdf
spellingShingle Engineering::Computer science and engineering
Density Based Clustering
Anchor Data Points
Wang, Yizhang
Wang, Di
Pang, Wei
Miao, Chunyan
Tan, Ah-Hwee
Zhou, You
A systematic density-based clustering method using anchor points
title A systematic density-based clustering method using anchor points
title_full A systematic density-based clustering method using anchor points
title_fullStr A systematic density-based clustering method using anchor points
title_full_unstemmed A systematic density-based clustering method using anchor points
title_short A systematic density-based clustering method using anchor points
title_sort systematic density based clustering method using anchor points
topic Engineering::Computer science and engineering
Density Based Clustering
Anchor Data Points
url https://hdl.handle.net/10356/144283
work_keys_str_mv AT wangyizhang asystematicdensitybasedclusteringmethodusinganchorpoints
AT wangdi asystematicdensitybasedclusteringmethodusinganchorpoints
AT pangwei asystematicdensitybasedclusteringmethodusinganchorpoints
AT miaochunyan asystematicdensitybasedclusteringmethodusinganchorpoints
AT tanahhwee asystematicdensitybasedclusteringmethodusinganchorpoints
AT zhouyou asystematicdensitybasedclusteringmethodusinganchorpoints
AT wangyizhang systematicdensitybasedclusteringmethodusinganchorpoints
AT wangdi systematicdensitybasedclusteringmethodusinganchorpoints
AT pangwei systematicdensitybasedclusteringmethodusinganchorpoints
AT miaochunyan systematicdensitybasedclusteringmethodusinganchorpoints
AT tanahhwee systematicdensitybasedclusteringmethodusinganchorpoints
AT zhouyou systematicdensitybasedclusteringmethodusinganchorpoints