transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences

Abstract Background Alignments of homologous DNA sequences are crucial for comparative genomics and phylogenetic analysis. However, multiple alignment represents a computationally difficult problem. For protein-coding DNA sequences, it is more advantage...

Full description

Bibliographic Details
Main Author:	Bininda-Emonds Olaf RP
Format:	Article
Language:	English
Published:	BMC 2005-06-01
Series:	BMC Bioinformatics
Online Access:	http://www.biomedcentral.com/1471-2105/6/156

_version_	1818035637842870272
author	Bininda-Emonds Olaf RP
author_facet	Bininda-Emonds Olaf RP
author_sort	Bininda-Emonds Olaf RP
collection	DOAJ
description	<p>Abstract</p> <p>Background</p> <p>Alignments of homologous DNA sequences are crucial for comparative genomics and phylogenetic analysis. However, multiple alignment represents a computationally difficult problem. For protein-coding DNA sequences, it is more advantageous in terms of both speed and accuracy to align the amino-acid sequences specified by the DNA sequences rather than the DNA sequences themselves. Many implementations making use of this concept of "translated alignments" are incomplete in the sense that they require the user to manually translate the DNA sequences and to perform the amino-acid alignment. As such, they are not well suited to large-scale automated alignments of large and/or numerous DNA data sets.</p> <p>Results</p> <p>transAlign is an open-source Perl script that aligns protein-coding DNA sequences via their amino-acid translations to take advantage of the superior multiple-alignment capabilities and speed of an amino-acid alignment. It operates by translating each DNA sequence into its corresponding amino-acid sequence, passing the entire matrix to ClustalW for alignment, and then back-translating the resulting amino-acid alignment to derive the aligned DNA sequences. In the translation step, transAlign determines the optimal orientation and reading frame for each DNA sequence according to the desired genetic code. It also checks for apparent frame shifts in the DNA sequences and can handle frame-shifted sequences in one of three ways (delete, align as amino acids regardless, or profile align as DNA). As a set of comparative benchmarks derived from six protein-coding genes for mammals shows, the strategy implemented in transAlign always improves the speed and usually the apparent accuracy of the alignment of protein-coding DNA sequences.</p> <p>Conclusion</p> <p>transAlign represents one of few full and cross-platform implementations of the concept of translated alignments. Both the advantages accruing from performing a translated alignment and the suite of user-definable options available in the program mean that transAlign is ideally suited for large-scale automated alignments of very large and/or very numerous protein-coding DNA data sets. However, the good performance offered by the program also translates to the alignment of any set of protein-coding sequences. transAlign, including the source code, is freely available at http://www.tierzucht.tum.de/Bininda-Emonds/ (under "Programs").</p>
first_indexed	2024-12-10T06:58:14Z
format	Article
id	doaj.art-e657f6bf0ec441108635edc8b87a5078
institution	Directory Open Access Journal
issn	1471-2105
language	English
last_indexed	2024-12-10T06:58:14Z
publishDate	2005-06-01
publisher	BMC
record_format	Article
series	BMC Bioinformatics
spelling	doaj.art-e657f6bf0ec441108635edc8b87a50782022-12-22T01:58:23ZengBMCBMC Bioinformatics1471-21052005-06-016115610.1186/1471-2105-6-156transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequencesBininda-Emonds Olaf RP<p>Abstract</p> <p>Background</p> <p>Alignments of homologous DNA sequences are crucial for comparative genomics and phylogenetic analysis. However, multiple alignment represents a computationally difficult problem. For protein-coding DNA sequences, it is more advantageous in terms of both speed and accuracy to align the amino-acid sequences specified by the DNA sequences rather than the DNA sequences themselves. Many implementations making use of this concept of "translated alignments" are incomplete in the sense that they require the user to manually translate the DNA sequences and to perform the amino-acid alignment. As such, they are not well suited to large-scale automated alignments of large and/or numerous DNA data sets.</p> <p>Results</p> <p>transAlign is an open-source Perl script that aligns protein-coding DNA sequences via their amino-acid translations to take advantage of the superior multiple-alignment capabilities and speed of an amino-acid alignment. It operates by translating each DNA sequence into its corresponding amino-acid sequence, passing the entire matrix to ClustalW for alignment, and then back-translating the resulting amino-acid alignment to derive the aligned DNA sequences. In the translation step, transAlign determines the optimal orientation and reading frame for each DNA sequence according to the desired genetic code. It also checks for apparent frame shifts in the DNA sequences and can handle frame-shifted sequences in one of three ways (delete, align as amino acids regardless, or profile align as DNA). As a set of comparative benchmarks derived from six protein-coding genes for mammals shows, the strategy implemented in transAlign always improves the speed and usually the apparent accuracy of the alignment of protein-coding DNA sequences.</p> <p>Conclusion</p> <p>transAlign represents one of few full and cross-platform implementations of the concept of translated alignments. Both the advantages accruing from performing a translated alignment and the suite of user-definable options available in the program mean that transAlign is ideally suited for large-scale automated alignments of very large and/or very numerous protein-coding DNA data sets. However, the good performance offered by the program also translates to the alignment of any set of protein-coding sequences. transAlign, including the source code, is freely available at http://www.tierzucht.tum.de/Bininda-Emonds/ (under "Programs").</p>http://www.biomedcentral.com/1471-2105/6/156
spellingShingle	Bininda-Emonds Olaf RP transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences BMC Bioinformatics
title	transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences
title_full	transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences
title_fullStr	transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences
title_full_unstemmed	transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences
title_short	transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences
title_sort	transalign using amino acids to facilitate the multiple alignment of protein coding dna sequences
url	http://www.biomedcentral.com/1471-2105/6/156
work_keys_str_mv	AT binindaemondsolafrp transalignusingaminoacidstofacilitatethemultiplealignmentofproteincodingdnasequences

transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences

Similar Items