Automatically extracting functionally equivalent proteins from SwissProt

<p>Abstract</p> <p>Background</p> <p>There is a frequent need to obtain sets of functionally equivalent homologous proteins (FEPs) from different species. While it is usually the case that orthology implies functional equivalence, this is not always true; therefore data...

Full description

Bibliographic Details
Main Authors: Martin Andrew CR, McMillan Lisa EM
Format: Article
Language:English
Published: BMC 2008-10-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/9/418
_version_ 1819199573923463168
author Martin Andrew CR
McMillan Lisa EM
author_facet Martin Andrew CR
McMillan Lisa EM
author_sort Martin Andrew CR
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>There is a frequent need to obtain sets of functionally equivalent homologous proteins (FEPs) from different species. While it is usually the case that orthology implies functional equivalence, this is not always true; therefore datasets of orthologous proteins are not appropriate. The information relevant to extracting FEPs is contained in databanks such as UniProtKB/Swiss-Prot and a manual analysis of these data allow FEPs to be extracted on a one-off basis. However there has been no resource allowing the easy, automatic extraction of groups of FEPs – for example, all instances of protein C.</p> <p>We have developed FOSTA, an automatically generated database of FEPs annotated as having the same function in UniProtKB/Swiss-Prot which can be used for large-scale analysis. The method builds a candidate list of homologues and filters out functionally diverged proteins on the basis of functional annotations using a simple text mining approach.</p> <p>Results</p> <p>Large scale evaluation of our FEP extraction method is difficult as there is no gold-standard dataset against which the method can be benchmarked. However, a manual analysis of five protein families confirmed a high level of performance. A more extensive comparison with two manually verified functional equivalence datasets also demonstrated very good performance.</p> <p>Conclusion</p> <p>In summary, FOSTA provides an automated analysis of annotations in UniProtKB/Swiss-Prot to enable groups of proteins already annotated as functionally equivalent, to be extracted. Our results demonstrate that the vast majority of UniProtKB/Swiss-Prot functional annotations are of high quality, and that FOSTA can interpret annotations successfully. Where FOSTA is not successful, we are able to highlight inconsistencies in UniProtKB/Swiss-Prot annotation. Most of these would have presented equal difficulties for manual interpretation of annotations. We discuss limitations and possible future extensions to FOSTA, and recommend changes to the UniProtKB/Swiss-Prot format, which would facilitate text-mining of UniProtKB/Swiss-Prot.</p>
first_indexed 2024-12-23T03:18:30Z
format Article
id doaj.art-15ef1e421bdd48c8b8e5bb235d14f811
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-23T03:18:30Z
publishDate 2008-10-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-15ef1e421bdd48c8b8e5bb235d14f8112022-12-21T18:02:02ZengBMCBMC Bioinformatics1471-21052008-10-019141810.1186/1471-2105-9-418Automatically extracting functionally equivalent proteins from SwissProtMartin Andrew CRMcMillan Lisa EM<p>Abstract</p> <p>Background</p> <p>There is a frequent need to obtain sets of functionally equivalent homologous proteins (FEPs) from different species. While it is usually the case that orthology implies functional equivalence, this is not always true; therefore datasets of orthologous proteins are not appropriate. The information relevant to extracting FEPs is contained in databanks such as UniProtKB/Swiss-Prot and a manual analysis of these data allow FEPs to be extracted on a one-off basis. However there has been no resource allowing the easy, automatic extraction of groups of FEPs – for example, all instances of protein C.</p> <p>We have developed FOSTA, an automatically generated database of FEPs annotated as having the same function in UniProtKB/Swiss-Prot which can be used for large-scale analysis. The method builds a candidate list of homologues and filters out functionally diverged proteins on the basis of functional annotations using a simple text mining approach.</p> <p>Results</p> <p>Large scale evaluation of our FEP extraction method is difficult as there is no gold-standard dataset against which the method can be benchmarked. However, a manual analysis of five protein families confirmed a high level of performance. A more extensive comparison with two manually verified functional equivalence datasets also demonstrated very good performance.</p> <p>Conclusion</p> <p>In summary, FOSTA provides an automated analysis of annotations in UniProtKB/Swiss-Prot to enable groups of proteins already annotated as functionally equivalent, to be extracted. Our results demonstrate that the vast majority of UniProtKB/Swiss-Prot functional annotations are of high quality, and that FOSTA can interpret annotations successfully. Where FOSTA is not successful, we are able to highlight inconsistencies in UniProtKB/Swiss-Prot annotation. Most of these would have presented equal difficulties for manual interpretation of annotations. We discuss limitations and possible future extensions to FOSTA, and recommend changes to the UniProtKB/Swiss-Prot format, which would facilitate text-mining of UniProtKB/Swiss-Prot.</p>http://www.biomedcentral.com/1471-2105/9/418
spellingShingle Martin Andrew CR
McMillan Lisa EM
Automatically extracting functionally equivalent proteins from SwissProt
BMC Bioinformatics
title Automatically extracting functionally equivalent proteins from SwissProt
title_full Automatically extracting functionally equivalent proteins from SwissProt
title_fullStr Automatically extracting functionally equivalent proteins from SwissProt
title_full_unstemmed Automatically extracting functionally equivalent proteins from SwissProt
title_short Automatically extracting functionally equivalent proteins from SwissProt
title_sort automatically extracting functionally equivalent proteins from swissprot
url http://www.biomedcentral.com/1471-2105/9/418
work_keys_str_mv AT martinandrewcr automaticallyextractingfunctionallyequivalentproteinsfromswissprot
AT mcmillanlisaem automaticallyextractingfunctionallyequivalentproteinsfromswissprot