A new protein linear motif benchmark for multiple sequence alignment software

<p>Abstract</p> <p>Background</p> <p>Linear motifs (LMs) are abundant short regulatory sites used for modulating the functions of many eukaryotic proteins. They play important roles in post-translational modification, cell compartment targeting, docking sites for regula...

Full description

Bibliographic Details
Main Authors: Poch Olivier, Chica Claudia, Perrodou Emmanuel, Gibson Toby J, Thompson Julie D
Format: Article
Language:English
Published: BMC 2008-04-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/9/213
_version_ 1819121574182125568
author Poch Olivier
Chica Claudia
Perrodou Emmanuel
Gibson Toby J
Thompson Julie D
author_facet Poch Olivier
Chica Claudia
Perrodou Emmanuel
Gibson Toby J
Thompson Julie D
author_sort Poch Olivier
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>Linear motifs (LMs) are abundant short regulatory sites used for modulating the functions of many eukaryotic proteins. They play important roles in post-translational modification, cell compartment targeting, docking sites for regulatory complex assembly and protein processing and cleavage. Methods for LM detection are now being developed that are strongly dependent on scores for motif conservation in homologous proteins. However, most LMs are found in natively disordered polypeptide segments that evolve rapidly, unhindered by structural constraints on the sequence. These regions of modular proteins are difficult to align using classical multiple sequence alignment programs that are specifically optimised to align the globular domains. As a consequence, poor motif alignment quality is hindering efforts to detect new LMs.</p> <p>Results</p> <p>We have developed a new benchmark, as part of the BAliBASE suite, designed to assess the ability of standard multiple alignment methods to detect and align LMs. The reference alignments are organised into different test sets representing real alignment problems and contain examples of experimentally verified functional motifs, extracted from the Eukaryotic Linear Motif (ELM) database. The benchmark has been used to evaluate and compare a number of multiple alignment programs. With distantly related proteins, the worst alignment program correctly aligns 48% of LMs compared to 73% for the best program. However, the performance of all the programs is adversely affected by the introduction of other sequences containing false positive motifs. The ranking of the alignment programs based on LM alignment quality is similar to that observed when considering full-length protein alignments, however little correlation was observed between LM and overall alignment quality for individual alignment test cases.</p> <p>Conclusion</p> <p>We have shown that none of the programs currently available is capable of reliably aligning LMs in distantly related sequences and we have highlighted a number of specific problems. The results of the tests suggest possible ways to improve program accuracy for difficult, divergent sequences.</p>
first_indexed 2024-12-22T06:38:43Z
format Article
id doaj.art-c7ccc5d1217e4e9bbb0e3d468f1a841c
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-22T06:38:43Z
publishDate 2008-04-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-c7ccc5d1217e4e9bbb0e3d468f1a841c2022-12-21T18:35:30ZengBMCBMC Bioinformatics1471-21052008-04-019121310.1186/1471-2105-9-213A new protein linear motif benchmark for multiple sequence alignment softwarePoch OlivierChica ClaudiaPerrodou EmmanuelGibson Toby JThompson Julie D<p>Abstract</p> <p>Background</p> <p>Linear motifs (LMs) are abundant short regulatory sites used for modulating the functions of many eukaryotic proteins. They play important roles in post-translational modification, cell compartment targeting, docking sites for regulatory complex assembly and protein processing and cleavage. Methods for LM detection are now being developed that are strongly dependent on scores for motif conservation in homologous proteins. However, most LMs are found in natively disordered polypeptide segments that evolve rapidly, unhindered by structural constraints on the sequence. These regions of modular proteins are difficult to align using classical multiple sequence alignment programs that are specifically optimised to align the globular domains. As a consequence, poor motif alignment quality is hindering efforts to detect new LMs.</p> <p>Results</p> <p>We have developed a new benchmark, as part of the BAliBASE suite, designed to assess the ability of standard multiple alignment methods to detect and align LMs. The reference alignments are organised into different test sets representing real alignment problems and contain examples of experimentally verified functional motifs, extracted from the Eukaryotic Linear Motif (ELM) database. The benchmark has been used to evaluate and compare a number of multiple alignment programs. With distantly related proteins, the worst alignment program correctly aligns 48% of LMs compared to 73% for the best program. However, the performance of all the programs is adversely affected by the introduction of other sequences containing false positive motifs. The ranking of the alignment programs based on LM alignment quality is similar to that observed when considering full-length protein alignments, however little correlation was observed between LM and overall alignment quality for individual alignment test cases.</p> <p>Conclusion</p> <p>We have shown that none of the programs currently available is capable of reliably aligning LMs in distantly related sequences and we have highlighted a number of specific problems. The results of the tests suggest possible ways to improve program accuracy for difficult, divergent sequences.</p>http://www.biomedcentral.com/1471-2105/9/213
spellingShingle Poch Olivier
Chica Claudia
Perrodou Emmanuel
Gibson Toby J
Thompson Julie D
A new protein linear motif benchmark for multiple sequence alignment software
BMC Bioinformatics
title A new protein linear motif benchmark for multiple sequence alignment software
title_full A new protein linear motif benchmark for multiple sequence alignment software
title_fullStr A new protein linear motif benchmark for multiple sequence alignment software
title_full_unstemmed A new protein linear motif benchmark for multiple sequence alignment software
title_short A new protein linear motif benchmark for multiple sequence alignment software
title_sort new protein linear motif benchmark for multiple sequence alignment software
url http://www.biomedcentral.com/1471-2105/9/213
work_keys_str_mv AT pocholivier anewproteinlinearmotifbenchmarkformultiplesequencealignmentsoftware
AT chicaclaudia anewproteinlinearmotifbenchmarkformultiplesequencealignmentsoftware
AT perrodouemmanuel anewproteinlinearmotifbenchmarkformultiplesequencealignmentsoftware
AT gibsontobyj anewproteinlinearmotifbenchmarkformultiplesequencealignmentsoftware
AT thompsonjulied anewproteinlinearmotifbenchmarkformultiplesequencealignmentsoftware
AT pocholivier newproteinlinearmotifbenchmarkformultiplesequencealignmentsoftware
AT chicaclaudia newproteinlinearmotifbenchmarkformultiplesequencealignmentsoftware
AT perrodouemmanuel newproteinlinearmotifbenchmarkformultiplesequencealignmentsoftware
AT gibsontobyj newproteinlinearmotifbenchmarkformultiplesequencealignmentsoftware
AT thompsonjulied newproteinlinearmotifbenchmarkformultiplesequencealignmentsoftware