RepSeq – A database of amino acid repeats present in lower eukaryotic pathogens

<p>Abstract</p> <p>Background</p> <p>Amino acid repeat-containing proteins have a broad range of functions and their identification is of relevance to many experimental biologists. In human-infective protozoan parasites (such as the Kinetoplastid and <it>Plasmodiu...

Full description

Bibliographic Details
Main Authors: Smith Deborah F, Lower Ryan PJ, Depledge Daniel P
Format: Article
Language:English
Published: BMC 2007-04-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/8/122
_version_ 1818063346551750656
author Smith Deborah F
Lower Ryan PJ
Depledge Daniel P
author_facet Smith Deborah F
Lower Ryan PJ
Depledge Daniel P
author_sort Smith Deborah F
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>Amino acid repeat-containing proteins have a broad range of functions and their identification is of relevance to many experimental biologists. In human-infective protozoan parasites (such as the Kinetoplastid and <it>Plasmodium </it>species), they are implicated in immune evasion and have been shown to influence virulence and pathogenicity. RepSeq <url>http://repseq.gugbe.com</url> is a new database of amino acid repeat-containing proteins found in lower eukaryotic pathogens. The RepSeq database is accessed via a web-based application which also provides links to related online tools and databases for further analyses.</p> <p>Results</p> <p>The RepSeq algorithm typically identifies more than 98% of repeat-containing proteins and is capable of identifying both perfect and mismatch repeats. The proportion of proteins that contain repeat elements varies greatly between different families and even species (3–35% of the total protein content). The most common motif type is the Sequence Repeat Region (SRR) – a repeated motif containing multiple different amino acid types. Proteins containing Single Amino Acid Repeats (SAARs) and Di-Peptide Repeats (DPRs) typically account for 0.5–1.0% of the total protein number. Notable exceptions are <it>P. falciparum </it>and <it>D. discoideum</it>, in which 33.67% and 34.28% respectively of the predicted proteomes consist of repeat-containing proteins. These numbers are due to large insertions of low complexity single and multi-codon repeat regions.</p> <p>Conclusion</p> <p>The RepSeq database provides a repository for repeat-containing proteins found in parasitic protozoa. The database allows for both individual and cross-species proteome analyses and also allows users to upload sequences of interest for analysis by the RepSeq algorithm. Identification of repeat-containing proteins provides researchers with a defined subset of proteins which can be analysed by expression profiling and functional characterisation, thereby facilitating study of pathogenicity and virulence factors in the parasitic protozoa. While primarily designed for kinetoplastid work, the RepSeq algorithm and database retain full functionality when used to analyse other species.</p>
first_indexed 2024-12-10T14:18:39Z
format Article
id doaj.art-22cee32617b34298a621ad4b6b604315
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-10T14:18:39Z
publishDate 2007-04-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-22cee32617b34298a621ad4b6b6043152022-12-22T01:45:16ZengBMCBMC Bioinformatics1471-21052007-04-018112210.1186/1471-2105-8-122RepSeq – A database of amino acid repeats present in lower eukaryotic pathogensSmith Deborah FLower Ryan PJDepledge Daniel P<p>Abstract</p> <p>Background</p> <p>Amino acid repeat-containing proteins have a broad range of functions and their identification is of relevance to many experimental biologists. In human-infective protozoan parasites (such as the Kinetoplastid and <it>Plasmodium </it>species), they are implicated in immune evasion and have been shown to influence virulence and pathogenicity. RepSeq <url>http://repseq.gugbe.com</url> is a new database of amino acid repeat-containing proteins found in lower eukaryotic pathogens. The RepSeq database is accessed via a web-based application which also provides links to related online tools and databases for further analyses.</p> <p>Results</p> <p>The RepSeq algorithm typically identifies more than 98% of repeat-containing proteins and is capable of identifying both perfect and mismatch repeats. The proportion of proteins that contain repeat elements varies greatly between different families and even species (3–35% of the total protein content). The most common motif type is the Sequence Repeat Region (SRR) – a repeated motif containing multiple different amino acid types. Proteins containing Single Amino Acid Repeats (SAARs) and Di-Peptide Repeats (DPRs) typically account for 0.5–1.0% of the total protein number. Notable exceptions are <it>P. falciparum </it>and <it>D. discoideum</it>, in which 33.67% and 34.28% respectively of the predicted proteomes consist of repeat-containing proteins. These numbers are due to large insertions of low complexity single and multi-codon repeat regions.</p> <p>Conclusion</p> <p>The RepSeq database provides a repository for repeat-containing proteins found in parasitic protozoa. The database allows for both individual and cross-species proteome analyses and also allows users to upload sequences of interest for analysis by the RepSeq algorithm. Identification of repeat-containing proteins provides researchers with a defined subset of proteins which can be analysed by expression profiling and functional characterisation, thereby facilitating study of pathogenicity and virulence factors in the parasitic protozoa. While primarily designed for kinetoplastid work, the RepSeq algorithm and database retain full functionality when used to analyse other species.</p>http://www.biomedcentral.com/1471-2105/8/122
spellingShingle Smith Deborah F
Lower Ryan PJ
Depledge Daniel P
RepSeq – A database of amino acid repeats present in lower eukaryotic pathogens
BMC Bioinformatics
title RepSeq – A database of amino acid repeats present in lower eukaryotic pathogens
title_full RepSeq – A database of amino acid repeats present in lower eukaryotic pathogens
title_fullStr RepSeq – A database of amino acid repeats present in lower eukaryotic pathogens
title_full_unstemmed RepSeq – A database of amino acid repeats present in lower eukaryotic pathogens
title_short RepSeq – A database of amino acid repeats present in lower eukaryotic pathogens
title_sort repseq a database of amino acid repeats present in lower eukaryotic pathogens
url http://www.biomedcentral.com/1471-2105/8/122
work_keys_str_mv AT smithdeborahf repseqadatabaseofaminoacidrepeatspresentinlowereukaryoticpathogens
AT lowerryanpj repseqadatabaseofaminoacidrepeatspresentinlowereukaryoticpathogens
AT depledgedanielp repseqadatabaseofaminoacidrepeatspresentinlowereukaryoticpathogens