The development of PIPA: an integrated and automated pipeline for genome-wide protein function annotation

<p>Abstract</p> <p>Background</p> <p>Automated protein function prediction methods are needed to keep pace with high-throughput sequencing. With the existence of many programs and databases for inferring different protein functions, a pipeline that properly integrates t...

Full description

Bibliographic Details
Main Authors: Stevens Fred J, Johnson Seth, Desai Valmik, Zavaljevski Nela, Yu Chenggang, Reifman Jaques
Format: Article
Language:English
Published: BMC 2008-01-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/9/52
_version_ 1818806771017318400
author Stevens Fred J
Johnson Seth
Desai Valmik
Zavaljevski Nela
Yu Chenggang
Reifman Jaques
author_facet Stevens Fred J
Johnson Seth
Desai Valmik
Zavaljevski Nela
Yu Chenggang
Reifman Jaques
author_sort Stevens Fred J
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>Automated protein function prediction methods are needed to keep pace with high-throughput sequencing. With the existence of many programs and databases for inferring different protein functions, a pipeline that properly integrates these resources will benefit from the advantages of each method. However, integrated systems usually do not provide mechanisms to generate customized databases to predict particular protein functions. Here, we describe a tool termed PIPA (Pipeline for Protein Annotation) that has these capabilities.</p> <p>Results</p> <p>PIPA annotates protein functions by combining the results of multiple programs and databases, such as InterPro and the Conserved Domains Database, into common Gene Ontology (GO) terms. The major algorithms implemented in PIPA are: (1) a profile database generation algorithm, which generates customized profile databases to predict particular protein functions, (2) an automated ontology mapping generation algorithm, which maps various classification schemes into GO, and (3) a consensus algorithm to reconcile annotations from the integrated programs and databases.</p> <p>PIPA's profile generation algorithm is employed to construct the enzyme profile database CatFam, which predicts catalytic functions described by Enzyme Commission (EC) numbers. Validation tests show that CatFam yields average recall and precision larger than 95.0%. CatFam is integrated with PIPA.</p> <p>We use an association rule mining algorithm to automatically generate mappings between terms of two ontologies from annotated sample proteins. Incorporating the ontologies' hierarchical topology into the algorithm increases the number of generated mappings. In particular, it generates 40.0% additional mappings from the Clusters of Orthologous Groups (COG) to EC numbers and a six-fold increase in mappings from COG to GO terms. The mappings to EC numbers show a very high precision (99.8%) and recall (96.6%), while the mappings to GO terms show moderate precision (80.0%) and low recall (33.0%).</p> <p>Our consensus algorithm for GO annotation is based on the computation and propagation of likelihood scores associated with GO terms. The test results suggest that, for a given recall, the application of the consensus algorithm yields higher precision than when consensus is not used.</p> <p>Conclusion</p> <p>The algorithms implemented in PIPA provide automated genome-wide protein function annotation based on reconciled predictions from multiple resources.</p>
first_indexed 2024-12-18T19:15:04Z
format Article
id doaj.art-daaa6433d0fb4b88b2e1b4c5cec563e3
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-18T19:15:04Z
publishDate 2008-01-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-daaa6433d0fb4b88b2e1b4c5cec563e32022-12-21T20:56:09ZengBMCBMC Bioinformatics1471-21052008-01-01915210.1186/1471-2105-9-52The development of PIPA: an integrated and automated pipeline for genome-wide protein function annotationStevens Fred JJohnson SethDesai ValmikZavaljevski NelaYu ChenggangReifman Jaques<p>Abstract</p> <p>Background</p> <p>Automated protein function prediction methods are needed to keep pace with high-throughput sequencing. With the existence of many programs and databases for inferring different protein functions, a pipeline that properly integrates these resources will benefit from the advantages of each method. However, integrated systems usually do not provide mechanisms to generate customized databases to predict particular protein functions. Here, we describe a tool termed PIPA (Pipeline for Protein Annotation) that has these capabilities.</p> <p>Results</p> <p>PIPA annotates protein functions by combining the results of multiple programs and databases, such as InterPro and the Conserved Domains Database, into common Gene Ontology (GO) terms. The major algorithms implemented in PIPA are: (1) a profile database generation algorithm, which generates customized profile databases to predict particular protein functions, (2) an automated ontology mapping generation algorithm, which maps various classification schemes into GO, and (3) a consensus algorithm to reconcile annotations from the integrated programs and databases.</p> <p>PIPA's profile generation algorithm is employed to construct the enzyme profile database CatFam, which predicts catalytic functions described by Enzyme Commission (EC) numbers. Validation tests show that CatFam yields average recall and precision larger than 95.0%. CatFam is integrated with PIPA.</p> <p>We use an association rule mining algorithm to automatically generate mappings between terms of two ontologies from annotated sample proteins. Incorporating the ontologies' hierarchical topology into the algorithm increases the number of generated mappings. In particular, it generates 40.0% additional mappings from the Clusters of Orthologous Groups (COG) to EC numbers and a six-fold increase in mappings from COG to GO terms. The mappings to EC numbers show a very high precision (99.8%) and recall (96.6%), while the mappings to GO terms show moderate precision (80.0%) and low recall (33.0%).</p> <p>Our consensus algorithm for GO annotation is based on the computation and propagation of likelihood scores associated with GO terms. The test results suggest that, for a given recall, the application of the consensus algorithm yields higher precision than when consensus is not used.</p> <p>Conclusion</p> <p>The algorithms implemented in PIPA provide automated genome-wide protein function annotation based on reconciled predictions from multiple resources.</p>http://www.biomedcentral.com/1471-2105/9/52
spellingShingle Stevens Fred J
Johnson Seth
Desai Valmik
Zavaljevski Nela
Yu Chenggang
Reifman Jaques
The development of PIPA: an integrated and automated pipeline for genome-wide protein function annotation
BMC Bioinformatics
title The development of PIPA: an integrated and automated pipeline for genome-wide protein function annotation
title_full The development of PIPA: an integrated and automated pipeline for genome-wide protein function annotation
title_fullStr The development of PIPA: an integrated and automated pipeline for genome-wide protein function annotation
title_full_unstemmed The development of PIPA: an integrated and automated pipeline for genome-wide protein function annotation
title_short The development of PIPA: an integrated and automated pipeline for genome-wide protein function annotation
title_sort development of pipa an integrated and automated pipeline for genome wide protein function annotation
url http://www.biomedcentral.com/1471-2105/9/52
work_keys_str_mv AT stevensfredj thedevelopmentofpipaanintegratedandautomatedpipelineforgenomewideproteinfunctionannotation
AT johnsonseth thedevelopmentofpipaanintegratedandautomatedpipelineforgenomewideproteinfunctionannotation
AT desaivalmik thedevelopmentofpipaanintegratedandautomatedpipelineforgenomewideproteinfunctionannotation
AT zavaljevskinela thedevelopmentofpipaanintegratedandautomatedpipelineforgenomewideproteinfunctionannotation
AT yuchenggang thedevelopmentofpipaanintegratedandautomatedpipelineforgenomewideproteinfunctionannotation
AT reifmanjaques thedevelopmentofpipaanintegratedandautomatedpipelineforgenomewideproteinfunctionannotation
AT stevensfredj developmentofpipaanintegratedandautomatedpipelineforgenomewideproteinfunctionannotation
AT johnsonseth developmentofpipaanintegratedandautomatedpipelineforgenomewideproteinfunctionannotation
AT desaivalmik developmentofpipaanintegratedandautomatedpipelineforgenomewideproteinfunctionannotation
AT zavaljevskinela developmentofpipaanintegratedandautomatedpipelineforgenomewideproteinfunctionannotation
AT yuchenggang developmentofpipaanintegratedandautomatedpipelineforgenomewideproteinfunctionannotation
AT reifmanjaques developmentofpipaanintegratedandautomatedpipelineforgenomewideproteinfunctionannotation