CLEAN: CLustering Enrichment ANalysis

<p>Abstract</p> <p>Background</p> <p>Integration of biological knowledge encoded in various lists of functionally related genes has become one of the most important aspects of analyzing genome-wide functional genomics data. In the context of cluster analysis, functional...

Full description

Bibliographic Details
Main Authors: Medvedovic Mario, Hu Zhen, Joshi Vineet K, Freudenberg Johannes M
Format: Article
Language:English
Published: BMC 2009-07-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/10/234
_version_ 1828258876306227200
author Medvedovic Mario
Hu Zhen
Joshi Vineet K
Freudenberg Johannes M
author_facet Medvedovic Mario
Hu Zhen
Joshi Vineet K
Freudenberg Johannes M
author_sort Medvedovic Mario
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>Integration of biological knowledge encoded in various lists of functionally related genes has become one of the most important aspects of analyzing genome-wide functional genomics data. In the context of cluster analysis, functional coherence of clusters established through such analyses have been used to identify biologically meaningful clusters, compare clustering algorithms and identify biological pathways associated with the biological process under investigation.</p> <p>Results</p> <p>We developed a computational framework for analytically and visually integrating knowledge-based functional categories with the cluster analysis of genomics data. The framework is based on the simple, conceptually appealing, and biologically interpretable gene-specific functional coherence score (CLEAN score). The score is derived by correlating the clustering structure as a whole with functional categories of interest. We directly demonstrate that integrating biological knowledge in this way improves the reproducibility of conclusions derived from cluster analysis. The CLEAN score differentiates between the levels of functional coherence for genes within the same cluster based on their membership in enriched functional categories. We show that this aspect results in higher reproducibility across independent datasets and produces more informative genes for distinguishing different sample types than the scores based on the traditional cluster-wide analysis. We also demonstrate the utility of the CLEAN framework in comparing clusterings produced by different algorithms. CLEAN was implemented as an add-on R package and can be downloaded at <url>http://Clusteranalysis.org</url>. The package integrates routines for calculating gene specific functional coherence scores and the open source interactive Java-based viewer Functional TreeView (FTreeView).</p> <p>Conclusion</p> <p>Our results indicate that using the gene-specific functional coherence score improves the reproducibility of the conclusions made about clusters of co-expressed genes over using the traditional cluster-wide scores. Using gene-specific coherence scores also simplifies the comparisons of clusterings produced by different clustering algorithms and provides a simple tool for selecting genes with a "functionally coherent" expression profile.</p>
first_indexed 2024-04-13T02:58:58Z
format Article
id doaj.art-bb79dec1de6148d29afbc001773ddbb4
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-04-13T02:58:58Z
publishDate 2009-07-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-bb79dec1de6148d29afbc001773ddbb42022-12-22T03:05:30ZengBMCBMC Bioinformatics1471-21052009-07-0110123410.1186/1471-2105-10-234CLEAN: CLustering Enrichment ANalysisMedvedovic MarioHu ZhenJoshi Vineet KFreudenberg Johannes M<p>Abstract</p> <p>Background</p> <p>Integration of biological knowledge encoded in various lists of functionally related genes has become one of the most important aspects of analyzing genome-wide functional genomics data. In the context of cluster analysis, functional coherence of clusters established through such analyses have been used to identify biologically meaningful clusters, compare clustering algorithms and identify biological pathways associated with the biological process under investigation.</p> <p>Results</p> <p>We developed a computational framework for analytically and visually integrating knowledge-based functional categories with the cluster analysis of genomics data. The framework is based on the simple, conceptually appealing, and biologically interpretable gene-specific functional coherence score (CLEAN score). The score is derived by correlating the clustering structure as a whole with functional categories of interest. We directly demonstrate that integrating biological knowledge in this way improves the reproducibility of conclusions derived from cluster analysis. The CLEAN score differentiates between the levels of functional coherence for genes within the same cluster based on their membership in enriched functional categories. We show that this aspect results in higher reproducibility across independent datasets and produces more informative genes for distinguishing different sample types than the scores based on the traditional cluster-wide analysis. We also demonstrate the utility of the CLEAN framework in comparing clusterings produced by different algorithms. CLEAN was implemented as an add-on R package and can be downloaded at <url>http://Clusteranalysis.org</url>. The package integrates routines for calculating gene specific functional coherence scores and the open source interactive Java-based viewer Functional TreeView (FTreeView).</p> <p>Conclusion</p> <p>Our results indicate that using the gene-specific functional coherence score improves the reproducibility of the conclusions made about clusters of co-expressed genes over using the traditional cluster-wide scores. Using gene-specific coherence scores also simplifies the comparisons of clusterings produced by different clustering algorithms and provides a simple tool for selecting genes with a "functionally coherent" expression profile.</p>http://www.biomedcentral.com/1471-2105/10/234
spellingShingle Medvedovic Mario
Hu Zhen
Joshi Vineet K
Freudenberg Johannes M
CLEAN: CLustering Enrichment ANalysis
BMC Bioinformatics
title CLEAN: CLustering Enrichment ANalysis
title_full CLEAN: CLustering Enrichment ANalysis
title_fullStr CLEAN: CLustering Enrichment ANalysis
title_full_unstemmed CLEAN: CLustering Enrichment ANalysis
title_short CLEAN: CLustering Enrichment ANalysis
title_sort clean clustering enrichment analysis
url http://www.biomedcentral.com/1471-2105/10/234
work_keys_str_mv AT medvedovicmario cleanclusteringenrichmentanalysis
AT huzhen cleanclusteringenrichmentanalysis
AT joshivineetk cleanclusteringenrichmentanalysis
AT freudenbergjohannesm cleanclusteringenrichmentanalysis