Simultaneous clustering of multiple gene expression and physical interaction datasets.

Many genome-wide datasets are routinely generated to study different aspects of biological systems, but integrating them to obtain a coherent view of the underlying biology remains a challenge. We propose simultaneous clustering of multiple networks as a framework to integrate large-scale datasets o...

Full description

Bibliographic Details
Main Authors: Manikandan Narayanan, Adrian Vetta, Eric E Schadt, Jun Zhu
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2010-04-01
Series:PLoS Computational Biology
Online Access:http://europepmc.org/articles/PMC2855327?pdf=render
_version_ 1811338100062814208
author Manikandan Narayanan
Adrian Vetta
Eric E Schadt
Jun Zhu
author_facet Manikandan Narayanan
Adrian Vetta
Eric E Schadt
Jun Zhu
author_sort Manikandan Narayanan
collection DOAJ
description Many genome-wide datasets are routinely generated to study different aspects of biological systems, but integrating them to obtain a coherent view of the underlying biology remains a challenge. We propose simultaneous clustering of multiple networks as a framework to integrate large-scale datasets on the interactions among and activities of cellular components. Specifically, we develop an algorithm JointCluster that finds sets of genes that cluster well in multiple networks of interest, such as coexpression networks summarizing correlations among the expression profiles of genes and physical networks describing protein-protein and protein-DNA interactions among genes or gene-products. Our algorithm provides an efficient solution to a well-defined problem of jointly clustering networks, using techniques that permit certain theoretical guarantees on the quality of the detected clustering relative to the optimal clustering. These guarantees coupled with an effective scaling heuristic and the flexibility to handle multiple heterogeneous networks make our method JointCluster an advance over earlier approaches. Simulation results showed JointCluster to be more robust than alternate methods in recovering clusters implanted in networks with high false positive rates. In systematic evaluation of JointCluster and some earlier approaches for combined analysis of the yeast physical network and two gene expression datasets under glucose and ethanol growth conditions, JointCluster discovers clusters that are more consistently enriched for various reference classes capturing different aspects of yeast biology or yield better coverage of the analysed genes. These robust clusters, which are supported across multiple genomic datasets and diverse reference classes, agree with known biology of yeast under these growth conditions, elucidate the genetic control of coordinated transcription, and enable functional predictions for a number of uncharacterized genes.
first_indexed 2024-04-13T18:05:40Z
format Article
id doaj.art-ae51e78859cf462ca0371214812b25fc
institution Directory Open Access Journal
issn 1553-734X
1553-7358
language English
last_indexed 2024-04-13T18:05:40Z
publishDate 2010-04-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj.art-ae51e78859cf462ca0371214812b25fc2022-12-22T02:36:05ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582010-04-0164e100074210.1371/journal.pcbi.1000742Simultaneous clustering of multiple gene expression and physical interaction datasets.Manikandan NarayananAdrian VettaEric E SchadtJun ZhuMany genome-wide datasets are routinely generated to study different aspects of biological systems, but integrating them to obtain a coherent view of the underlying biology remains a challenge. We propose simultaneous clustering of multiple networks as a framework to integrate large-scale datasets on the interactions among and activities of cellular components. Specifically, we develop an algorithm JointCluster that finds sets of genes that cluster well in multiple networks of interest, such as coexpression networks summarizing correlations among the expression profiles of genes and physical networks describing protein-protein and protein-DNA interactions among genes or gene-products. Our algorithm provides an efficient solution to a well-defined problem of jointly clustering networks, using techniques that permit certain theoretical guarantees on the quality of the detected clustering relative to the optimal clustering. These guarantees coupled with an effective scaling heuristic and the flexibility to handle multiple heterogeneous networks make our method JointCluster an advance over earlier approaches. Simulation results showed JointCluster to be more robust than alternate methods in recovering clusters implanted in networks with high false positive rates. In systematic evaluation of JointCluster and some earlier approaches for combined analysis of the yeast physical network and two gene expression datasets under glucose and ethanol growth conditions, JointCluster discovers clusters that are more consistently enriched for various reference classes capturing different aspects of yeast biology or yield better coverage of the analysed genes. These robust clusters, which are supported across multiple genomic datasets and diverse reference classes, agree with known biology of yeast under these growth conditions, elucidate the genetic control of coordinated transcription, and enable functional predictions for a number of uncharacterized genes.http://europepmc.org/articles/PMC2855327?pdf=render
spellingShingle Manikandan Narayanan
Adrian Vetta
Eric E Schadt
Jun Zhu
Simultaneous clustering of multiple gene expression and physical interaction datasets.
PLoS Computational Biology
title Simultaneous clustering of multiple gene expression and physical interaction datasets.
title_full Simultaneous clustering of multiple gene expression and physical interaction datasets.
title_fullStr Simultaneous clustering of multiple gene expression and physical interaction datasets.
title_full_unstemmed Simultaneous clustering of multiple gene expression and physical interaction datasets.
title_short Simultaneous clustering of multiple gene expression and physical interaction datasets.
title_sort simultaneous clustering of multiple gene expression and physical interaction datasets
url http://europepmc.org/articles/PMC2855327?pdf=render
work_keys_str_mv AT manikandannarayanan simultaneousclusteringofmultiplegeneexpressionandphysicalinteractiondatasets
AT adrianvetta simultaneousclusteringofmultiplegeneexpressionandphysicalinteractiondatasets
AT ericeschadt simultaneousclusteringofmultiplegeneexpressionandphysicalinteractiondatasets
AT junzhu simultaneousclusteringofmultiplegeneexpressionandphysicalinteractiondatasets