Stratified Feature Sampling for Semi-Supervised Ensemble Clustering

Ensemble Clustering (EC), which seeks to generate a consensus clustering by integrating multiple base clusterings, has attracted increasing attentions. However, traditional EC methods typically have three main limitations: (1) High dimensional data present a huge challenge to ensemble clustering met...

Full description

Bibliographic Details
Main Authors: Jialin Tian, Yazhou Ren, Xiang Cheng
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8825848/
_version_ 1819174615400841216
author Jialin Tian
Yazhou Ren
Xiang Cheng
author_facet Jialin Tian
Yazhou Ren
Xiang Cheng
author_sort Jialin Tian
collection DOAJ
description Ensemble Clustering (EC), which seeks to generate a consensus clustering by integrating multiple base clusterings, has attracted increasing attentions. However, traditional EC methods typically have three main limitations: (1) High dimensional data present a huge challenge to ensemble clustering methods. (2) Most EC algorithms can not use prior information, e.g., pairwise constraints, to enhance the clustering performance. (3) Even in existing semi-supervised ensemble clustering methods, prior information is not sufficiently used, e.g., only used in generating base clusterings. To alleviate these problems, we propose Stratified Feature Sampling for Semi-Supervised Ensemble Clustering (SFS<sup>3</sup>EC). Firstly, we develop a novel stratified feature sampling method, which can cope with high dimensional data, guarantee the diversity of base clusterings, and reduce the risk that some features are not selected at the same time. Secondly, semi-supervised clustering, i.e., constraint propagation, is applied to obtain base clusterings. Finally, we propose to utilize prior information in both the base clustering generating process and the consensus process, which guarantees that prior information is sufficiently used. We conduct a series of experiments on ten real-world data sets to demonstrate the effectiveness of the proposed model.
first_indexed 2024-12-22T20:41:47Z
format Article
id doaj.art-5aa9d969ad3e4d09980f82f7fc8493fb
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-22T20:41:47Z
publishDate 2019-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-5aa9d969ad3e4d09980f82f7fc8493fb2022-12-21T18:13:19ZengIEEEIEEE Access2169-35362019-01-01712866912867510.1109/ACCESS.2019.29395818825848Stratified Feature Sampling for Semi-Supervised Ensemble ClusteringJialin Tian0Yazhou Ren1https://orcid.org/0000-0001-7705-4603Xiang Cheng2School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, ChinaSchool of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, ChinaDepartment of Computer Science, Virginia Tech, Blacksburg, VA, USAEnsemble Clustering (EC), which seeks to generate a consensus clustering by integrating multiple base clusterings, has attracted increasing attentions. However, traditional EC methods typically have three main limitations: (1) High dimensional data present a huge challenge to ensemble clustering methods. (2) Most EC algorithms can not use prior information, e.g., pairwise constraints, to enhance the clustering performance. (3) Even in existing semi-supervised ensemble clustering methods, prior information is not sufficiently used, e.g., only used in generating base clusterings. To alleviate these problems, we propose Stratified Feature Sampling for Semi-Supervised Ensemble Clustering (SFS<sup>3</sup>EC). Firstly, we develop a novel stratified feature sampling method, which can cope with high dimensional data, guarantee the diversity of base clusterings, and reduce the risk that some features are not selected at the same time. Secondly, semi-supervised clustering, i.e., constraint propagation, is applied to obtain base clusterings. Finally, we propose to utilize prior information in both the base clustering generating process and the consensus process, which guarantees that prior information is sufficiently used. We conduct a series of experiments on ten real-world data sets to demonstrate the effectiveness of the proposed model.https://ieeexplore.ieee.org/document/8825848/Constraint propagationensemble clusteringhigh dimensional datasemi-supervised learningstratified feature sampling
spellingShingle Jialin Tian
Yazhou Ren
Xiang Cheng
Stratified Feature Sampling for Semi-Supervised Ensemble Clustering
IEEE Access
Constraint propagation
ensemble clustering
high dimensional data
semi-supervised learning
stratified feature sampling
title Stratified Feature Sampling for Semi-Supervised Ensemble Clustering
title_full Stratified Feature Sampling for Semi-Supervised Ensemble Clustering
title_fullStr Stratified Feature Sampling for Semi-Supervised Ensemble Clustering
title_full_unstemmed Stratified Feature Sampling for Semi-Supervised Ensemble Clustering
title_short Stratified Feature Sampling for Semi-Supervised Ensemble Clustering
title_sort stratified feature sampling for semi supervised ensemble clustering
topic Constraint propagation
ensemble clustering
high dimensional data
semi-supervised learning
stratified feature sampling
url https://ieeexplore.ieee.org/document/8825848/
work_keys_str_mv AT jialintian stratifiedfeaturesamplingforsemisupervisedensembleclustering
AT yazhouren stratifiedfeaturesamplingforsemisupervisedensembleclustering
AT xiangcheng stratifiedfeaturesamplingforsemisupervisedensembleclustering