Benchmarking tools for the alignment of functional noncoding DNA

<p>Abstract</p> <p>Background</p> <p>Numerous tools have been developed to align genomic sequences. However, their relative performance in specific applications remains poorly characterized. Alignments of protein-coding sequences typically have been benchmarked against...

Full description

Bibliographic Details
Main Authors: Stoye Jens, Bergman Casey M, Pollard Daniel A, Celniker Susan E, Eisen Michael B
Format: Article
Language:English
Published: BMC 2004-01-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/5/6
_version_ 1811248078722695168
author Stoye Jens
Bergman Casey M
Pollard Daniel A
Celniker Susan E
Eisen Michael B
author_facet Stoye Jens
Bergman Casey M
Pollard Daniel A
Celniker Susan E
Eisen Michael B
author_sort Stoye Jens
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>Numerous tools have been developed to align genomic sequences. However, their relative performance in specific applications remains poorly characterized. Alignments of protein-coding sequences typically have been benchmarked against "correct" alignments inferred from structural data. For noncoding sequences, where such independent validation is lacking, simulation provides an effective means to generate "correct" alignments with which to benchmark alignment tools.</p> <p>Results</p> <p>Using rates of noncoding sequence evolution estimated from the genus <it>Drosophila</it>, we simulated alignments over a range of divergence times under varying models incorporating point substitution, insertion/deletion events, and short blocks of constrained sequences such as those found in <it>cis</it>-regulatory regions. We then compared "correct" alignments generated by a modified version of the ROSE simulation platform to alignments of the simulated derived sequences produced by eight pairwise alignment tools (Avid, BlastZ, Chaos, ClustalW, DiAlign, Lagan, Needle, and WABA) to determine the off-the-shelf performance of each tool. As expected, the ability to align noncoding sequences accurately decreases with increasing divergence for all tools, and declines faster in the presence of insertion/deletion evolution. Global alignment tools (Avid, ClustalW, Lagan, and Needle) typically have higher sensitivity over entire noncoding sequences as well as in constrained sequences. Local tools (BlastZ, Chaos, and WABA) have lower overall sensitivity as a consequence of incomplete coverage, but have high specificity to detect constrained sequences as well as high sensitivity within the subset of sequences they align. Tools such as DiAlign, which generate both local and global outputs, produce alignments of constrained sequences with both high sensitivity and specificity for divergence distances in the range of 1.25–3.0 substitutions per site.</p> <p>Conclusion</p> <p>For species with genomic properties similar to <it>Drosophila</it>, we conclude that a single pair of optimally diverged species analyzed with a high performance alignment tool can yield accurate and specific alignments of functionally constrained noncoding sequences. Further algorithm development, optimization of alignment parameters, and benchmarking studies will be necessary to extract the maximal biological information from alignments of functional noncoding DNA.</p>
first_indexed 2024-04-12T15:21:50Z
format Article
id doaj.art-753950aa0677404fae2b40db5cdd1af5
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-04-12T15:21:50Z
publishDate 2004-01-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-753950aa0677404fae2b40db5cdd1af52022-12-22T03:27:25ZengBMCBMC Bioinformatics1471-21052004-01-01516Benchmarking tools for the alignment of functional noncoding DNAStoye JensBergman Casey MPollard Daniel ACelniker Susan EEisen Michael B<p>Abstract</p> <p>Background</p> <p>Numerous tools have been developed to align genomic sequences. However, their relative performance in specific applications remains poorly characterized. Alignments of protein-coding sequences typically have been benchmarked against "correct" alignments inferred from structural data. For noncoding sequences, where such independent validation is lacking, simulation provides an effective means to generate "correct" alignments with which to benchmark alignment tools.</p> <p>Results</p> <p>Using rates of noncoding sequence evolution estimated from the genus <it>Drosophila</it>, we simulated alignments over a range of divergence times under varying models incorporating point substitution, insertion/deletion events, and short blocks of constrained sequences such as those found in <it>cis</it>-regulatory regions. We then compared "correct" alignments generated by a modified version of the ROSE simulation platform to alignments of the simulated derived sequences produced by eight pairwise alignment tools (Avid, BlastZ, Chaos, ClustalW, DiAlign, Lagan, Needle, and WABA) to determine the off-the-shelf performance of each tool. As expected, the ability to align noncoding sequences accurately decreases with increasing divergence for all tools, and declines faster in the presence of insertion/deletion evolution. Global alignment tools (Avid, ClustalW, Lagan, and Needle) typically have higher sensitivity over entire noncoding sequences as well as in constrained sequences. Local tools (BlastZ, Chaos, and WABA) have lower overall sensitivity as a consequence of incomplete coverage, but have high specificity to detect constrained sequences as well as high sensitivity within the subset of sequences they align. Tools such as DiAlign, which generate both local and global outputs, produce alignments of constrained sequences with both high sensitivity and specificity for divergence distances in the range of 1.25–3.0 substitutions per site.</p> <p>Conclusion</p> <p>For species with genomic properties similar to <it>Drosophila</it>, we conclude that a single pair of optimally diverged species analyzed with a high performance alignment tool can yield accurate and specific alignments of functionally constrained noncoding sequences. Further algorithm development, optimization of alignment parameters, and benchmarking studies will be necessary to extract the maximal biological information from alignments of functional noncoding DNA.</p>http://www.biomedcentral.com/1471-2105/5/6
spellingShingle Stoye Jens
Bergman Casey M
Pollard Daniel A
Celniker Susan E
Eisen Michael B
Benchmarking tools for the alignment of functional noncoding DNA
BMC Bioinformatics
title Benchmarking tools for the alignment of functional noncoding DNA
title_full Benchmarking tools for the alignment of functional noncoding DNA
title_fullStr Benchmarking tools for the alignment of functional noncoding DNA
title_full_unstemmed Benchmarking tools for the alignment of functional noncoding DNA
title_short Benchmarking tools for the alignment of functional noncoding DNA
title_sort benchmarking tools for the alignment of functional noncoding dna
url http://www.biomedcentral.com/1471-2105/5/6
work_keys_str_mv AT stoyejens benchmarkingtoolsforthealignmentoffunctionalnoncodingdna
AT bergmancaseym benchmarkingtoolsforthealignmentoffunctionalnoncodingdna
AT pollarddaniela benchmarkingtoolsforthealignmentoffunctionalnoncodingdna
AT celnikersusane benchmarkingtoolsforthealignmentoffunctionalnoncodingdna
AT eisenmichaelb benchmarkingtoolsforthealignmentoffunctionalnoncodingdna