Bayesian inference for genomic data integration reduces misclassification rate in predicting protein-protein interactions.

Protein-protein interactions (PPIs) are essential to most fundamental cellular processes. There has been increasing interest in reconstructing PPIs networks. However, several critical difficulties exist in obtaining reliable predictions. Noticeably, false positive rates can be as high as >80%. Er...

Full description

Bibliographic Details
Main Authors:	Chuanhua Xing, David B Dunson
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2011-07-01
Series:	PLoS Computational Biology
Online Access:	http://europepmc.org/articles/PMC3145649?pdf=render

_version_	1819144809988751360
author	Chuanhua Xing David B Dunson
author_facet	Chuanhua Xing David B Dunson
author_sort	Chuanhua Xing
collection	DOAJ
description	Protein-protein interactions (PPIs) are essential to most fundamental cellular processes. There has been increasing interest in reconstructing PPIs networks. However, several critical difficulties exist in obtaining reliable predictions. Noticeably, false positive rates can be as high as >80%. Error correction from each generating source can be both time-consuming and inefficient due to the difficulty of covering the errors from multiple levels of data processing procedures within a single test. We propose a novel Bayesian integration method, deemed nonparametric Bayes ensemble learning (NBEL), to lower the misclassification rate (both false positives and negatives) through automatically up-weighting data sources that are most informative, while down-weighting less informative and biased sources. Extensive studies indicate that NBEL is significantly more robust than the classic naïve Bayes to unreliable, error-prone and contaminated data. On a large human data set our NBEL approach predicts many more PPIs than naïve Bayes. This suggests that previous studies may have large numbers of not only false positives but also false negatives. The validation on two human PPIs datasets having high quality supports our observations. Our experiments demonstrate that it is feasible to predict high-throughput PPIs computationally with substantially reduced false positives and false negatives. The ability of predicting large numbers of PPIs both reliably and automatically may inspire people to use computational approaches to correct data errors in general, and may speed up PPIs prediction with high quality. Such a reliable prediction may provide a solid platform to other studies such as protein functions prediction and roles of PPIs in disease susceptibility.
first_indexed	2024-12-22T12:48:03Z
format	Article
id	doaj.art-b417538f58be4b179943ff5586590434
institution	Directory Open Access Journal
issn	1553-734X 1553-7358
language	English
last_indexed	2024-12-22T12:48:03Z
publishDate	2011-07-01
publisher	Public Library of Science (PLoS)
record_format	Article
series	PLoS Computational Biology
spelling	doaj.art-b417538f58be4b179943ff55865904342022-12-21T18:25:17ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582011-07-0177e100211010.1371/journal.pcbi.1002110Bayesian inference for genomic data integration reduces misclassification rate in predicting protein-protein interactions.Chuanhua XingDavid B DunsonProtein-protein interactions (PPIs) are essential to most fundamental cellular processes. There has been increasing interest in reconstructing PPIs networks. However, several critical difficulties exist in obtaining reliable predictions. Noticeably, false positive rates can be as high as >80%. Error correction from each generating source can be both time-consuming and inefficient due to the difficulty of covering the errors from multiple levels of data processing procedures within a single test. We propose a novel Bayesian integration method, deemed nonparametric Bayes ensemble learning (NBEL), to lower the misclassification rate (both false positives and negatives) through automatically up-weighting data sources that are most informative, while down-weighting less informative and biased sources. Extensive studies indicate that NBEL is significantly more robust than the classic naïve Bayes to unreliable, error-prone and contaminated data. On a large human data set our NBEL approach predicts many more PPIs than naïve Bayes. This suggests that previous studies may have large numbers of not only false positives but also false negatives. The validation on two human PPIs datasets having high quality supports our observations. Our experiments demonstrate that it is feasible to predict high-throughput PPIs computationally with substantially reduced false positives and false negatives. The ability of predicting large numbers of PPIs both reliably and automatically may inspire people to use computational approaches to correct data errors in general, and may speed up PPIs prediction with high quality. Such a reliable prediction may provide a solid platform to other studies such as protein functions prediction and roles of PPIs in disease susceptibility.http://europepmc.org/articles/PMC3145649?pdf=render
spellingShingle	Chuanhua Xing David B Dunson Bayesian inference for genomic data integration reduces misclassification rate in predicting protein-protein interactions. PLoS Computational Biology
title	Bayesian inference for genomic data integration reduces misclassification rate in predicting protein-protein interactions.
title_full	Bayesian inference for genomic data integration reduces misclassification rate in predicting protein-protein interactions.
title_fullStr	Bayesian inference for genomic data integration reduces misclassification rate in predicting protein-protein interactions.
title_full_unstemmed	Bayesian inference for genomic data integration reduces misclassification rate in predicting protein-protein interactions.
title_short	Bayesian inference for genomic data integration reduces misclassification rate in predicting protein-protein interactions.
title_sort	bayesian inference for genomic data integration reduces misclassification rate in predicting protein protein interactions
url	http://europepmc.org/articles/PMC3145649?pdf=render
work_keys_str_mv	AT chuanhuaxing bayesianinferenceforgenomicdataintegrationreducesmisclassificationrateinpredictingproteinproteininteractions AT davidbdunson bayesianinferenceforgenomicdataintegrationreducesmisclassificationrateinpredictingproteinproteininteractions

Bayesian inference for genomic data integration reduces misclassification rate in predicting protein-protein interactions.

Similar Items