SMART: unique splitting-while-merging framework for gene clustering.

Successful clustering algorithms are highly dependent on parameter settings. The clustering performance degrades significantly unless parameters are properly set, and yet, it is difficult to set these parameters a priori. To address this issue, in this paper, we propose a unique splitting-while-merg...

Full description

Bibliographic Details
Main Authors: Rui Fa, David J Roberts, Asoke K Nandi
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2014-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC3979766?pdf=render
_version_ 1811321569268465664
author Rui Fa
David J Roberts
Asoke K Nandi
author_facet Rui Fa
David J Roberts
Asoke K Nandi
author_sort Rui Fa
collection DOAJ
description Successful clustering algorithms are highly dependent on parameter settings. The clustering performance degrades significantly unless parameters are properly set, and yet, it is difficult to set these parameters a priori. To address this issue, in this paper, we propose a unique splitting-while-merging clustering framework, named "splitting merging awareness tactics" (SMART), which does not require any a priori knowledge of either the number of clusters or even the possible range of this number. Unlike existing self-splitting algorithms, which over-cluster the dataset to a large number of clusters and then merge some similar clusters, our framework has the ability to split and merge clusters automatically during the process and produces the the most reliable clustering results, by intrinsically integrating many clustering techniques and tasks. The SMART framework is implemented with two distinct clustering paradigms in two algorithms: competitive learning and finite mixture model. Nevertheless, within the proposed SMART framework, many other algorithms can be derived for different clustering paradigms. The minimum message length algorithm is integrated into the framework as the clustering selection criterion. The usefulness of the SMART framework and its algorithms is tested in demonstration datasets and simulated gene expression datasets. Moreover, two real microarray gene expression datasets are studied using this approach. Based on the performance of many metrics, all numerical results show that SMART is superior to compared existing self-splitting algorithms and traditional algorithms. Three main properties of the proposed SMART framework are summarized as: (1) needing no parameters dependent on the respective dataset or a priori knowledge about the datasets, (2) extendible to many different applications, (3) offering superior performance compared with counterpart algorithms.
first_indexed 2024-04-13T13:18:52Z
format Article
id doaj.art-e54cba2b9a174311b903dcbc5604308d
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-04-13T13:18:52Z
publishDate 2014-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-e54cba2b9a174311b903dcbc5604308d2022-12-22T02:45:22ZengPublic Library of Science (PLoS)PLoS ONE1932-62032014-01-0194e9414110.1371/journal.pone.0094141SMART: unique splitting-while-merging framework for gene clustering.Rui FaDavid J RobertsAsoke K NandiSuccessful clustering algorithms are highly dependent on parameter settings. The clustering performance degrades significantly unless parameters are properly set, and yet, it is difficult to set these parameters a priori. To address this issue, in this paper, we propose a unique splitting-while-merging clustering framework, named "splitting merging awareness tactics" (SMART), which does not require any a priori knowledge of either the number of clusters or even the possible range of this number. Unlike existing self-splitting algorithms, which over-cluster the dataset to a large number of clusters and then merge some similar clusters, our framework has the ability to split and merge clusters automatically during the process and produces the the most reliable clustering results, by intrinsically integrating many clustering techniques and tasks. The SMART framework is implemented with two distinct clustering paradigms in two algorithms: competitive learning and finite mixture model. Nevertheless, within the proposed SMART framework, many other algorithms can be derived for different clustering paradigms. The minimum message length algorithm is integrated into the framework as the clustering selection criterion. The usefulness of the SMART framework and its algorithms is tested in demonstration datasets and simulated gene expression datasets. Moreover, two real microarray gene expression datasets are studied using this approach. Based on the performance of many metrics, all numerical results show that SMART is superior to compared existing self-splitting algorithms and traditional algorithms. Three main properties of the proposed SMART framework are summarized as: (1) needing no parameters dependent on the respective dataset or a priori knowledge about the datasets, (2) extendible to many different applications, (3) offering superior performance compared with counterpart algorithms.http://europepmc.org/articles/PMC3979766?pdf=render
spellingShingle Rui Fa
David J Roberts
Asoke K Nandi
SMART: unique splitting-while-merging framework for gene clustering.
PLoS ONE
title SMART: unique splitting-while-merging framework for gene clustering.
title_full SMART: unique splitting-while-merging framework for gene clustering.
title_fullStr SMART: unique splitting-while-merging framework for gene clustering.
title_full_unstemmed SMART: unique splitting-while-merging framework for gene clustering.
title_short SMART: unique splitting-while-merging framework for gene clustering.
title_sort smart unique splitting while merging framework for gene clustering
url http://europepmc.org/articles/PMC3979766?pdf=render
work_keys_str_mv AT ruifa smartuniquesplittingwhilemergingframeworkforgeneclustering
AT davidjroberts smartuniquesplittingwhilemergingframeworkforgeneclustering
AT asokeknandi smartuniquesplittingwhilemergingframeworkforgeneclustering