Robust large-scale clustering based on correntropy.

With the explosive growth of data, how to efficiently cluster large-scale unlabeled data has become an important issue that needs to be solved urgently. Especially in the face of large-scale real-world data, which contains a large number of complex distributions of noises and outliers, the research...

Full description

Bibliographic Details
Main Authors: Guodong Jin, Jing Gao, Lining Tan
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2022-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0277012
_version_ 1817975674199080960
author Guodong Jin
Jing Gao
Lining Tan
author_facet Guodong Jin
Jing Gao
Lining Tan
author_sort Guodong Jin
collection DOAJ
description With the explosive growth of data, how to efficiently cluster large-scale unlabeled data has become an important issue that needs to be solved urgently. Especially in the face of large-scale real-world data, which contains a large number of complex distributions of noises and outliers, the research on robust large-scale real-world data clustering algorithms has become one of the hottest topics. In response to this issue, a robust large-scale clustering algorithm based on correntropy (RLSCC) is proposed in this paper, specifically, k-means is firstly applied to generated pseudo-labels which reduce input data scale of subsequent spectral clustering, then anchor graphs instead of full sample graphs are introduced into spectral clustering to obtain final clustering results based on pseudo-labels which further improve the efficiency. Therefore, RLSCC inherits the advantages of the effectiveness of k-means and spectral clustering while greatly reducing the computational complexity. Furthermore, correntropy is developed to suppress the influence of noises and outlier the real-world data on the robustness of clustering. Finally, extensive experiments were carried out on real-world datasets and noise datasets and the results show that compared with other state-of-the-art algorithms, RLSCC can improve efficiency and robustness greatly while maintaining comparable or even higher clustering effectiveness.
first_indexed 2024-04-13T21:53:04Z
format Article
id doaj.art-c731a94781114624a12f7687cecca273
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-04-13T21:53:04Z
publishDate 2022-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-c731a94781114624a12f7687cecca2732022-12-22T02:28:22ZengPublic Library of Science (PLoS)PLoS ONE1932-62032022-01-011711e027701210.1371/journal.pone.0277012Robust large-scale clustering based on correntropy.Guodong JinJing GaoLining TanWith the explosive growth of data, how to efficiently cluster large-scale unlabeled data has become an important issue that needs to be solved urgently. Especially in the face of large-scale real-world data, which contains a large number of complex distributions of noises and outliers, the research on robust large-scale real-world data clustering algorithms has become one of the hottest topics. In response to this issue, a robust large-scale clustering algorithm based on correntropy (RLSCC) is proposed in this paper, specifically, k-means is firstly applied to generated pseudo-labels which reduce input data scale of subsequent spectral clustering, then anchor graphs instead of full sample graphs are introduced into spectral clustering to obtain final clustering results based on pseudo-labels which further improve the efficiency. Therefore, RLSCC inherits the advantages of the effectiveness of k-means and spectral clustering while greatly reducing the computational complexity. Furthermore, correntropy is developed to suppress the influence of noises and outlier the real-world data on the robustness of clustering. Finally, extensive experiments were carried out on real-world datasets and noise datasets and the results show that compared with other state-of-the-art algorithms, RLSCC can improve efficiency and robustness greatly while maintaining comparable or even higher clustering effectiveness.https://doi.org/10.1371/journal.pone.0277012
spellingShingle Guodong Jin
Jing Gao
Lining Tan
Robust large-scale clustering based on correntropy.
PLoS ONE
title Robust large-scale clustering based on correntropy.
title_full Robust large-scale clustering based on correntropy.
title_fullStr Robust large-scale clustering based on correntropy.
title_full_unstemmed Robust large-scale clustering based on correntropy.
title_short Robust large-scale clustering based on correntropy.
title_sort robust large scale clustering based on correntropy
url https://doi.org/10.1371/journal.pone.0277012
work_keys_str_mv AT guodongjin robustlargescaleclusteringbasedoncorrentropy
AT jinggao robustlargescaleclusteringbasedoncorrentropy
AT liningtan robustlargescaleclusteringbasedoncorrentropy