Convex clustering: an attractive alternative to hierarchical clustering.

The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominan...

Full description

Bibliographic Details
Main Authors: Gary K Chen, Eric C Chi, John Michael O Ranola, Kenneth Lange
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2015-05-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1004228
_version_ 1818579194310819840
author Gary K Chen
Eric C Chi
John Michael O Ranola
Kenneth Lange
author_facet Gary K Chen
Eric C Chi
John Michael O Ranola
Kenneth Lange
author_sort Gary K Chen
collection DOAJ
description The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/.
first_indexed 2024-12-16T06:57:49Z
format Article
id doaj.art-f34cef8860c4434a9c2d07a25b8165fd
institution Directory Open Access Journal
issn 1553-734X
1553-7358
language English
last_indexed 2024-12-16T06:57:49Z
publishDate 2015-05-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj.art-f34cef8860c4434a9c2d07a25b8165fd2022-12-21T22:40:15ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582015-05-01115e100422810.1371/journal.pcbi.1004228Convex clustering: an attractive alternative to hierarchical clustering.Gary K ChenEric C ChiJohn Michael O RanolaKenneth LangeThe primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/.https://doi.org/10.1371/journal.pcbi.1004228
spellingShingle Gary K Chen
Eric C Chi
John Michael O Ranola
Kenneth Lange
Convex clustering: an attractive alternative to hierarchical clustering.
PLoS Computational Biology
title Convex clustering: an attractive alternative to hierarchical clustering.
title_full Convex clustering: an attractive alternative to hierarchical clustering.
title_fullStr Convex clustering: an attractive alternative to hierarchical clustering.
title_full_unstemmed Convex clustering: an attractive alternative to hierarchical clustering.
title_short Convex clustering: an attractive alternative to hierarchical clustering.
title_sort convex clustering an attractive alternative to hierarchical clustering
url https://doi.org/10.1371/journal.pcbi.1004228
work_keys_str_mv AT garykchen convexclusteringanattractivealternativetohierarchicalclustering
AT ericcchi convexclusteringanattractivealternativetohierarchicalclustering
AT johnmichaeloranola convexclusteringanattractivealternativetohierarchicalclustering
AT kennethlange convexclusteringanattractivealternativetohierarchicalclustering