Filtering high-throughput protein-protein interaction data using a combination of genomic features
<p>Abstract</p> <p>Background</p> <p>Protein-protein interaction data used in the creation or prediction of molecular networks is usually obtained from large scale or high-throughput experiments. This experimental data is liable to contain a large number of spurious int...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2005-04-01
|
Series: | BMC Bioinformatics |
Online Access: | http://www.biomedcentral.com/1471-2105/6/100 |
_version_ | 1811278597928779776 |
---|---|
author | Patil Ashwini Nakamura Haruki |
author_facet | Patil Ashwini Nakamura Haruki |
author_sort | Patil Ashwini |
collection | DOAJ |
description | <p>Abstract</p> <p>Background</p> <p>Protein-protein interaction data used in the creation or prediction of molecular networks is usually obtained from large scale or high-throughput experiments. This experimental data is liable to contain a large number of spurious interactions. Hence, there is a need to validate the interactions and filter out the incorrect data before using them in prediction studies.</p> <p>Results</p> <p>In this study, we use a combination of 3 genomic features – structurally known interacting Pfam domains, Gene Ontology annotations and sequence homology – as a means to assign reliability to the protein-protein interactions in <it>Saccharomyces cerevisiae </it>determined by high-throughput experiments. Using Bayesian network approaches, we show that protein-protein interactions from high-throughput data supported by one or more genomic features have a higher likelihood ratio and hence are more likely to be real interactions. Our method has a high sensitivity (90%) and good specificity (63%). We show that 56% of the interactions from high-throughput experiments in <it>Saccharomyces cerevisiae </it>have high reliability. We use the method to estimate the number of true interactions in the high-throughput protein-protein interaction data sets in <it>Caenorhabditis elegans</it>, <it>Drosophila melanogaster </it>and <it>Homo sapiens </it>to be 27%, 18% and 68% respectively. Our results are available for searching and downloading at <url>http://helix.protein.osaka-u.ac.jp/htp/</url>.</p> <p>Conclusion</p> <p>A combination of genomic features that include sequence, structure and annotation information is a good predictor of true interactions in large and noisy high-throughput data sets. The method has a very high sensitivity and good specificity and can be used to assign a likelihood ratio, corresponding to the reliability, to each interaction.</p> |
first_indexed | 2024-04-13T00:38:47Z |
format | Article |
id | doaj.art-39ef4f02a2134902a9ee29691ac15030 |
institution | Directory Open Access Journal |
issn | 1471-2105 |
language | English |
last_indexed | 2024-04-13T00:38:47Z |
publishDate | 2005-04-01 |
publisher | BMC |
record_format | Article |
series | BMC Bioinformatics |
spelling | doaj.art-39ef4f02a2134902a9ee29691ac150302022-12-22T03:10:16ZengBMCBMC Bioinformatics1471-21052005-04-016110010.1186/1471-2105-6-100Filtering high-throughput protein-protein interaction data using a combination of genomic featuresPatil AshwiniNakamura Haruki<p>Abstract</p> <p>Background</p> <p>Protein-protein interaction data used in the creation or prediction of molecular networks is usually obtained from large scale or high-throughput experiments. This experimental data is liable to contain a large number of spurious interactions. Hence, there is a need to validate the interactions and filter out the incorrect data before using them in prediction studies.</p> <p>Results</p> <p>In this study, we use a combination of 3 genomic features – structurally known interacting Pfam domains, Gene Ontology annotations and sequence homology – as a means to assign reliability to the protein-protein interactions in <it>Saccharomyces cerevisiae </it>determined by high-throughput experiments. Using Bayesian network approaches, we show that protein-protein interactions from high-throughput data supported by one or more genomic features have a higher likelihood ratio and hence are more likely to be real interactions. Our method has a high sensitivity (90%) and good specificity (63%). We show that 56% of the interactions from high-throughput experiments in <it>Saccharomyces cerevisiae </it>have high reliability. We use the method to estimate the number of true interactions in the high-throughput protein-protein interaction data sets in <it>Caenorhabditis elegans</it>, <it>Drosophila melanogaster </it>and <it>Homo sapiens </it>to be 27%, 18% and 68% respectively. Our results are available for searching and downloading at <url>http://helix.protein.osaka-u.ac.jp/htp/</url>.</p> <p>Conclusion</p> <p>A combination of genomic features that include sequence, structure and annotation information is a good predictor of true interactions in large and noisy high-throughput data sets. The method has a very high sensitivity and good specificity and can be used to assign a likelihood ratio, corresponding to the reliability, to each interaction.</p>http://www.biomedcentral.com/1471-2105/6/100 |
spellingShingle | Patil Ashwini Nakamura Haruki Filtering high-throughput protein-protein interaction data using a combination of genomic features BMC Bioinformatics |
title | Filtering high-throughput protein-protein interaction data using a combination of genomic features |
title_full | Filtering high-throughput protein-protein interaction data using a combination of genomic features |
title_fullStr | Filtering high-throughput protein-protein interaction data using a combination of genomic features |
title_full_unstemmed | Filtering high-throughput protein-protein interaction data using a combination of genomic features |
title_short | Filtering high-throughput protein-protein interaction data using a combination of genomic features |
title_sort | filtering high throughput protein protein interaction data using a combination of genomic features |
url | http://www.biomedcentral.com/1471-2105/6/100 |
work_keys_str_mv | AT patilashwini filteringhighthroughputproteinproteininteractiondatausingacombinationofgenomicfeatures AT nakamuraharuki filteringhighthroughputproteinproteininteractiondatausingacombinationofgenomicfeatures |