A comparison of Pfam and MEROPS: Two databases, one comprehensive, and one specialised.

<p>Abstract</p> <p>Background</p> <p>We wished to compare two databases based on sequence similarity: one that aims to be comprehensive in its coverage of known sequences, and one that specialises in a relatively small subset of known sequences. One of the motivations b...

Full description

Bibliographic Details
Main Authors: Barrett Alan J, Rawlings Neil D, Studholme David J, Bateman Alex
Format: Article
Language:English
Published: BMC 2003-05-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/4/17
_version_ 1818919562198908928
author Barrett Alan J
Rawlings Neil D
Studholme David J
Bateman Alex
author_facet Barrett Alan J
Rawlings Neil D
Studholme David J
Bateman Alex
author_sort Barrett Alan J
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>We wished to compare two databases based on sequence similarity: one that aims to be comprehensive in its coverage of known sequences, and one that specialises in a relatively small subset of known sequences. One of the motivations behind this study was quality control. Pfam is a comprehensive collection of alignments and hidden Markov models representing families of proteins and domains. MEROPS is a catalogue and classification of enzymes with proteolytic activity (peptidases or proteases). These secondary databases are used by researchers worldwide, yet their contents are not peer reviewed. Therefore, we hoped that a systematic comparison of the contents of Pfam and MEROPS would highlight missing members and false-positives leading to improvements in quality of both databases. An additional reason for carrying out this study was to explore the extent of consensus in the definition of a protein family.</p> <p>Results</p> <p>About half (89 out of 174) of the peptidase families in MEROPS overlapped single Pfam families. A further 32 MEROPS families overlapped multiple Pfam families. Where possible, new Pfam families were built to represent most of the MEROPS families that did not overlap Pfam. When comparing the numbers of sequences found in the overlap between a MEROPS family and its corresponding Pfam family, in most cases the overlap was substantial (52 pairs of MEROPS and Pfam families had an intersection size of greater than 75% of the union) but there were some differences in the sets of sequences included in the MEROPS families versus the overlapping Pfam families.</p> <p>Conclusions</p> <p>A number of the discrepancies between MEROPS families and their corresponding Pfam families arose from differences in the aims and philosophies of the two databases. Examination of some of the discrepancies highlighted additional members of families, which have subsequently been added in both Pfam and MEROPS. This has led to improvements in the quality of both databases. Overall there was a great deal of consensus between the databases in definitions of a protein family.</p>
first_indexed 2024-12-20T01:07:50Z
format Article
id doaj.art-d86c4b14367940f9824cb28beba6e1f5
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-20T01:07:50Z
publishDate 2003-05-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-d86c4b14367940f9824cb28beba6e1f52022-12-21T19:58:46ZengBMCBMC Bioinformatics1471-21052003-05-01411710.1186/1471-2105-4-17A comparison of Pfam and MEROPS: Two databases, one comprehensive, and one specialised.Barrett Alan JRawlings Neil DStudholme David JBateman Alex<p>Abstract</p> <p>Background</p> <p>We wished to compare two databases based on sequence similarity: one that aims to be comprehensive in its coverage of known sequences, and one that specialises in a relatively small subset of known sequences. One of the motivations behind this study was quality control. Pfam is a comprehensive collection of alignments and hidden Markov models representing families of proteins and domains. MEROPS is a catalogue and classification of enzymes with proteolytic activity (peptidases or proteases). These secondary databases are used by researchers worldwide, yet their contents are not peer reviewed. Therefore, we hoped that a systematic comparison of the contents of Pfam and MEROPS would highlight missing members and false-positives leading to improvements in quality of both databases. An additional reason for carrying out this study was to explore the extent of consensus in the definition of a protein family.</p> <p>Results</p> <p>About half (89 out of 174) of the peptidase families in MEROPS overlapped single Pfam families. A further 32 MEROPS families overlapped multiple Pfam families. Where possible, new Pfam families were built to represent most of the MEROPS families that did not overlap Pfam. When comparing the numbers of sequences found in the overlap between a MEROPS family and its corresponding Pfam family, in most cases the overlap was substantial (52 pairs of MEROPS and Pfam families had an intersection size of greater than 75% of the union) but there were some differences in the sets of sequences included in the MEROPS families versus the overlapping Pfam families.</p> <p>Conclusions</p> <p>A number of the discrepancies between MEROPS families and their corresponding Pfam families arose from differences in the aims and philosophies of the two databases. Examination of some of the discrepancies highlighted additional members of families, which have subsequently been added in both Pfam and MEROPS. This has led to improvements in the quality of both databases. Overall there was a great deal of consensus between the databases in definitions of a protein family.</p>http://www.biomedcentral.com/1471-2105/4/17
spellingShingle Barrett Alan J
Rawlings Neil D
Studholme David J
Bateman Alex
A comparison of Pfam and MEROPS: Two databases, one comprehensive, and one specialised.
BMC Bioinformatics
title A comparison of Pfam and MEROPS: Two databases, one comprehensive, and one specialised.
title_full A comparison of Pfam and MEROPS: Two databases, one comprehensive, and one specialised.
title_fullStr A comparison of Pfam and MEROPS: Two databases, one comprehensive, and one specialised.
title_full_unstemmed A comparison of Pfam and MEROPS: Two databases, one comprehensive, and one specialised.
title_short A comparison of Pfam and MEROPS: Two databases, one comprehensive, and one specialised.
title_sort comparison of pfam and merops two databases one comprehensive and one specialised
url http://www.biomedcentral.com/1471-2105/4/17
work_keys_str_mv AT barrettalanj acomparisonofpfamandmeropstwodatabasesonecomprehensiveandonespecialised
AT rawlingsneild acomparisonofpfamandmeropstwodatabasesonecomprehensiveandonespecialised
AT studholmedavidj acomparisonofpfamandmeropstwodatabasesonecomprehensiveandonespecialised
AT batemanalex acomparisonofpfamandmeropstwodatabasesonecomprehensiveandonespecialised
AT barrettalanj comparisonofpfamandmeropstwodatabasesonecomprehensiveandonespecialised
AT rawlingsneild comparisonofpfamandmeropstwodatabasesonecomprehensiveandonespecialised
AT studholmedavidj comparisonofpfamandmeropstwodatabasesonecomprehensiveandonespecialised
AT batemanalex comparisonofpfamandmeropstwodatabasesonecomprehensiveandonespecialised