Ultra-scalable spectral clustering and ensemble clustering

This paper focuses on scalability and robustness of spectral clustering for extremely large-scale datasets with limited resources. Two novel algorithms are proposed, namely, ultra-scalable spectral clustering (U-SPEC) and ultra-scalable ensemble clustering (U-SENC). In U-SPEC, a hybrid representativ...

Full description

Bibliographic Details
Main Authors: Huang, Dong, Wang, Chang-Dong, Wu, Jiansheng, Lai, Jian-Huang, Kwoh, Chee-Keong
Other Authors: School of Computer Science and Engineering
Format: Journal Article
Language:English
Published: 2020
Subjects:
Online Access:https://hdl.handle.net/10356/139670
_version_ 1826118560805552128
author Huang, Dong
Wang, Chang-Dong
Wu, Jiansheng
Lai, Jian-Huang
Kwoh, Chee-Keong
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Huang, Dong
Wang, Chang-Dong
Wu, Jiansheng
Lai, Jian-Huang
Kwoh, Chee-Keong
author_sort Huang, Dong
collection NTU
description This paper focuses on scalability and robustness of spectral clustering for extremely large-scale datasets with limited resources. Two novel algorithms are proposed, namely, ultra-scalable spectral clustering (U-SPEC) and ultra-scalable ensemble clustering (U-SENC). In U-SPEC, a hybrid representative selection strategy and a fast approximation method for K -nearest representatives are proposed for the construction of a sparse affinity sub-matrix. By interpreting the sparse sub-matrix as a bipartite graph, the transfer cut is then utilized to efficiently partition the graph and obtain the clustering result. In U-SENC, multiple U-SPEC clusterers are further integrated into an ensemble clustering framework to enhance the robustness of U-SPEC while maintaining high efficiency. Based on the ensemble generation via multiple U-SEPC's, a new bipartite graph is constructed between objects and base clusters and then efficiently partitioned to achieve the consensus clustering result. It is noteworthy that both U-SPEC and U-SENC have nearly linear time and space complexity, and are capable of robustly and efficiently partitioning 10-million-level nonlinearly-separable datasets on a PC with 64 GB memory. Experiments on various large-scale datasets have demonstrated the scalability and robustness of our algorithms. The MATLAB code and experimental data are available at https://www.researchgate.net/publication/330760669 .
first_indexed 2024-10-01T04:45:27Z
format Journal Article
id ntu-10356/139670
institution Nanyang Technological University
language English
last_indexed 2024-10-01T04:45:27Z
publishDate 2020
record_format dspace
spelling ntu-10356/1396702020-05-21T01:45:37Z Ultra-scalable spectral clustering and ensemble clustering Huang, Dong Wang, Chang-Dong Wu, Jiansheng Lai, Jian-Huang Kwoh, Chee-Keong School of Computer Science and Engineering Engineering::Computer science and engineering Data Clustering Large-scale Clustering This paper focuses on scalability and robustness of spectral clustering for extremely large-scale datasets with limited resources. Two novel algorithms are proposed, namely, ultra-scalable spectral clustering (U-SPEC) and ultra-scalable ensemble clustering (U-SENC). In U-SPEC, a hybrid representative selection strategy and a fast approximation method for K -nearest representatives are proposed for the construction of a sparse affinity sub-matrix. By interpreting the sparse sub-matrix as a bipartite graph, the transfer cut is then utilized to efficiently partition the graph and obtain the clustering result. In U-SENC, multiple U-SPEC clusterers are further integrated into an ensemble clustering framework to enhance the robustness of U-SPEC while maintaining high efficiency. Based on the ensemble generation via multiple U-SEPC's, a new bipartite graph is constructed between objects and base clusters and then efficiently partitioned to achieve the consensus clustering result. It is noteworthy that both U-SPEC and U-SENC have nearly linear time and space complexity, and are capable of robustly and efficiently partitioning 10-million-level nonlinearly-separable datasets on a PC with 64 GB memory. Experiments on various large-scale datasets have demonstrated the scalability and robustness of our algorithms. The MATLAB code and experimental data are available at https://www.researchgate.net/publication/330760669 . Accepted version 2020-05-21T01:45:37Z 2020-05-21T01:45:37Z 2019 Journal Article Huang, D., Wang, C.-D., Wu, J.-S., Lai, J.-H., & Kwoh, C.-K. (2020). Ultra-scalable spectral clustering and ensemble clustering. IEEE Transactions on Knowledge and Data Engineering, 32(6), 1212-1226. doi:10.1109/TKDE.2019.2903410 1041-4347 https://hdl.handle.net/10356/139670 10.1109/TKDE.2019.2903410 6 32 1212 1226 en IEEE Transactions on Knowledge and Data Engineering IEEE Transactions on Knowledge and Data Engineering © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: https://doi.org/10.1109/TKDE.2019.2903410 14 p. application/pdf
spellingShingle Engineering::Computer science and engineering
Data Clustering
Large-scale Clustering
Huang, Dong
Wang, Chang-Dong
Wu, Jiansheng
Lai, Jian-Huang
Kwoh, Chee-Keong
Ultra-scalable spectral clustering and ensemble clustering
title Ultra-scalable spectral clustering and ensemble clustering
title_full Ultra-scalable spectral clustering and ensemble clustering
title_fullStr Ultra-scalable spectral clustering and ensemble clustering
title_full_unstemmed Ultra-scalable spectral clustering and ensemble clustering
title_short Ultra-scalable spectral clustering and ensemble clustering
title_sort ultra scalable spectral clustering and ensemble clustering
topic Engineering::Computer science and engineering
Data Clustering
Large-scale Clustering
url https://hdl.handle.net/10356/139670
work_keys_str_mv AT huangdong ultrascalablespectralclusteringandensembleclustering
AT wangchangdong ultrascalablespectralclusteringandensembleclustering
AT wujiansheng ultrascalablespectralclusteringandensembleclustering
AT laijianhuang ultrascalablespectralclusteringandensembleclustering
AT kwohcheekeong ultrascalablespectralclusteringandensembleclustering