Integration of relational and hierarchical network information for protein function prediction

<p>Abstract</p> <p>Background</p> <p>In the current climate of high-throughput computational biology, the inference of a protein's function from related measurements, such as protein-protein interaction relations, has become a canonical task. Most existing technolo...

Full description

Bibliographic Details
Main Authors: Jiang Xiaoyu, Nariai Naoki, Steffen Martin, Kasif Simon, Kolaczyk Eric D
Format: Article
Language:English
Published: BMC 2008-08-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/9/350
_version_ 1818641470185275392
author Jiang Xiaoyu
Nariai Naoki
Steffen Martin
Kasif Simon
Kolaczyk Eric D
author_facet Jiang Xiaoyu
Nariai Naoki
Steffen Martin
Kasif Simon
Kolaczyk Eric D
author_sort Jiang Xiaoyu
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>In the current climate of high-throughput computational biology, the inference of a protein's function from related measurements, such as protein-protein interaction relations, has become a canonical task. Most existing technologies pursue this task as a classification problem, on a term-by-term basis, for each term in a database, such as the Gene Ontology (GO) database, a popular rigorous vocabulary for biological functions. However, ontology structures are essentially hierarchies, with certain top to bottom annotation rules which protein function predictions should in principle follow. Currently, the most common approach to imposing these hierarchical constraints on network-based classifiers is through the use of transitive closure to predictions.</p> <p>Results</p> <p>We propose a probabilistic framework to integrate information in relational data, in the form of a protein-protein interaction network, and a hierarchically structured database of terms, in the form of the GO database, for the purpose of protein function prediction. At the heart of our framework is a factorization of local neighborhood information in the protein-protein interaction network across successive ancestral terms in the GO hierarchy. We introduce a classifier within this framework, with computationally efficient implementation, that produces GO-term predictions that naturally obey a hierarchical 'true-path' consistency from root to leaves, without the need for further post-processing.</p> <p>Conclusion</p> <p>A cross-validation study, using data from the yeast <it>Saccharomyces cerevisiae</it>, shows our method offers substantial improvements over both standard 'guilt-by-association' (i.e., Nearest-Neighbor) and more refined Markov random field methods, whether in their original form or when post-processed to artificially impose 'true-path' consistency. Further analysis of the results indicates that these improvements are associated with increased predictive capabilities (i.e., increased positive predictive value), and that this increase is consistent uniformly with GO-term depth. Additional <it>in silico </it>validation on a collection of new annotations recently added to GO confirms the advantages suggested by the cross-validation study. Taken as a whole, our results show that a hierarchical approach to network-based protein function prediction, that exploits the ontological structure of protein annotation databases in a principled manner, can offer substantial advantages over the successive application of 'flat' network-based methods.</p>
first_indexed 2024-12-16T23:27:40Z
format Article
id doaj.art-117419d201354d3bbd62df66ac5a4f37
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-16T23:27:40Z
publishDate 2008-08-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-117419d201354d3bbd62df66ac5a4f372022-12-21T22:11:57ZengBMCBMC Bioinformatics1471-21052008-08-019135010.1186/1471-2105-9-350Integration of relational and hierarchical network information for protein function predictionJiang XiaoyuNariai NaokiSteffen MartinKasif SimonKolaczyk Eric D<p>Abstract</p> <p>Background</p> <p>In the current climate of high-throughput computational biology, the inference of a protein's function from related measurements, such as protein-protein interaction relations, has become a canonical task. Most existing technologies pursue this task as a classification problem, on a term-by-term basis, for each term in a database, such as the Gene Ontology (GO) database, a popular rigorous vocabulary for biological functions. However, ontology structures are essentially hierarchies, with certain top to bottom annotation rules which protein function predictions should in principle follow. Currently, the most common approach to imposing these hierarchical constraints on network-based classifiers is through the use of transitive closure to predictions.</p> <p>Results</p> <p>We propose a probabilistic framework to integrate information in relational data, in the form of a protein-protein interaction network, and a hierarchically structured database of terms, in the form of the GO database, for the purpose of protein function prediction. At the heart of our framework is a factorization of local neighborhood information in the protein-protein interaction network across successive ancestral terms in the GO hierarchy. We introduce a classifier within this framework, with computationally efficient implementation, that produces GO-term predictions that naturally obey a hierarchical 'true-path' consistency from root to leaves, without the need for further post-processing.</p> <p>Conclusion</p> <p>A cross-validation study, using data from the yeast <it>Saccharomyces cerevisiae</it>, shows our method offers substantial improvements over both standard 'guilt-by-association' (i.e., Nearest-Neighbor) and more refined Markov random field methods, whether in their original form or when post-processed to artificially impose 'true-path' consistency. Further analysis of the results indicates that these improvements are associated with increased predictive capabilities (i.e., increased positive predictive value), and that this increase is consistent uniformly with GO-term depth. Additional <it>in silico </it>validation on a collection of new annotations recently added to GO confirms the advantages suggested by the cross-validation study. Taken as a whole, our results show that a hierarchical approach to network-based protein function prediction, that exploits the ontological structure of protein annotation databases in a principled manner, can offer substantial advantages over the successive application of 'flat' network-based methods.</p>http://www.biomedcentral.com/1471-2105/9/350
spellingShingle Jiang Xiaoyu
Nariai Naoki
Steffen Martin
Kasif Simon
Kolaczyk Eric D
Integration of relational and hierarchical network information for protein function prediction
BMC Bioinformatics
title Integration of relational and hierarchical network information for protein function prediction
title_full Integration of relational and hierarchical network information for protein function prediction
title_fullStr Integration of relational and hierarchical network information for protein function prediction
title_full_unstemmed Integration of relational and hierarchical network information for protein function prediction
title_short Integration of relational and hierarchical network information for protein function prediction
title_sort integration of relational and hierarchical network information for protein function prediction
url http://www.biomedcentral.com/1471-2105/9/350
work_keys_str_mv AT jiangxiaoyu integrationofrelationalandhierarchicalnetworkinformationforproteinfunctionprediction
AT nariainaoki integrationofrelationalandhierarchicalnetworkinformationforproteinfunctionprediction
AT steffenmartin integrationofrelationalandhierarchicalnetworkinformationforproteinfunctionprediction
AT kasifsimon integrationofrelationalandhierarchicalnetworkinformationforproteinfunctionprediction
AT kolaczykericd integrationofrelationalandhierarchicalnetworkinformationforproteinfunctionprediction