Comparison of methods for estimating the nucleotide substitution matrix

<p>Abstract</p> <p>Background</p> <p>The nucleotide substitution rate matrix is a key parameter of molecular evolution. Several methods for inferring this parameter have been proposed, with different mathematical bases. These methods include counting sequence difference...

Full description

Bibliographic Details
Main Authors: Huttley Gavin A, Yap Von Bing, McDonald Daniel, Oscamou Maribeth, Lladser Manuel E, Knight Rob
Format: Article
Language:English
Published: BMC 2008-12-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/9/511
_version_ 1818677547847647232
author Huttley Gavin A
Yap Von Bing
McDonald Daniel
Oscamou Maribeth
Lladser Manuel E
Knight Rob
author_facet Huttley Gavin A
Yap Von Bing
McDonald Daniel
Oscamou Maribeth
Lladser Manuel E
Knight Rob
author_sort Huttley Gavin A
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>The nucleotide substitution rate matrix is a key parameter of molecular evolution. Several methods for inferring this parameter have been proposed, with different mathematical bases. These methods include counting sequence differences and taking the log of the resulting probability matrices, methods based on Markov triples, and maximum likelihood methods that infer the substitution probabilities that lead to the most likely model of evolution. However, the speed and accuracy of these methods has not been compared.</p> <p>Results</p> <p>Different methods differ in performance by orders of magnitude (ranging from 1 ms to 10 s per matrix), but differences in accuracy of rate matrix reconstruction appear to be relatively small. Encouragingly, relatively simple and fast methods can provide results at least as accurate as far more complex and computationally intensive methods, especially when the sequences to be compared are relatively short.</p> <p>Conclusion</p> <p>Based on the conditions tested, we recommend the use of method of Gojobori <it>et al</it>. (1982) for long sequences (> 600 nucleotides), and the method of Goldman <it>et al</it>. (1996) for shorter sequences (< 600 nucleotides). The method of Barry and Hartigan (1987) can provide somewhat more accuracy, measured as the Euclidean distance between the true and inferred matrices, on long sequences (> 2000 nucleotides) at the expense of substantially longer computation time. The availability of methods that are both fast and accurate will allow us to gain a global picture of change in the nucleotide substitution rate matrix on a genomewide scale across the tree of life.</p>
first_indexed 2024-12-17T09:01:07Z
format Article
id doaj.art-78633f65f64341e7bbde8cbe88c59909
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-17T09:01:07Z
publishDate 2008-12-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-78633f65f64341e7bbde8cbe88c599092022-12-21T21:55:42ZengBMCBMC Bioinformatics1471-21052008-12-019151110.1186/1471-2105-9-511Comparison of methods for estimating the nucleotide substitution matrixHuttley Gavin AYap Von BingMcDonald DanielOscamou MaribethLladser Manuel EKnight Rob<p>Abstract</p> <p>Background</p> <p>The nucleotide substitution rate matrix is a key parameter of molecular evolution. Several methods for inferring this parameter have been proposed, with different mathematical bases. These methods include counting sequence differences and taking the log of the resulting probability matrices, methods based on Markov triples, and maximum likelihood methods that infer the substitution probabilities that lead to the most likely model of evolution. However, the speed and accuracy of these methods has not been compared.</p> <p>Results</p> <p>Different methods differ in performance by orders of magnitude (ranging from 1 ms to 10 s per matrix), but differences in accuracy of rate matrix reconstruction appear to be relatively small. Encouragingly, relatively simple and fast methods can provide results at least as accurate as far more complex and computationally intensive methods, especially when the sequences to be compared are relatively short.</p> <p>Conclusion</p> <p>Based on the conditions tested, we recommend the use of method of Gojobori <it>et al</it>. (1982) for long sequences (> 600 nucleotides), and the method of Goldman <it>et al</it>. (1996) for shorter sequences (< 600 nucleotides). The method of Barry and Hartigan (1987) can provide somewhat more accuracy, measured as the Euclidean distance between the true and inferred matrices, on long sequences (> 2000 nucleotides) at the expense of substantially longer computation time. The availability of methods that are both fast and accurate will allow us to gain a global picture of change in the nucleotide substitution rate matrix on a genomewide scale across the tree of life.</p>http://www.biomedcentral.com/1471-2105/9/511
spellingShingle Huttley Gavin A
Yap Von Bing
McDonald Daniel
Oscamou Maribeth
Lladser Manuel E
Knight Rob
Comparison of methods for estimating the nucleotide substitution matrix
BMC Bioinformatics
title Comparison of methods for estimating the nucleotide substitution matrix
title_full Comparison of methods for estimating the nucleotide substitution matrix
title_fullStr Comparison of methods for estimating the nucleotide substitution matrix
title_full_unstemmed Comparison of methods for estimating the nucleotide substitution matrix
title_short Comparison of methods for estimating the nucleotide substitution matrix
title_sort comparison of methods for estimating the nucleotide substitution matrix
url http://www.biomedcentral.com/1471-2105/9/511
work_keys_str_mv AT huttleygavina comparisonofmethodsforestimatingthenucleotidesubstitutionmatrix
AT yapvonbing comparisonofmethodsforestimatingthenucleotidesubstitutionmatrix
AT mcdonalddaniel comparisonofmethodsforestimatingthenucleotidesubstitutionmatrix
AT oscamoumaribeth comparisonofmethodsforestimatingthenucleotidesubstitutionmatrix
AT lladsermanuele comparisonofmethodsforestimatingthenucleotidesubstitutionmatrix
AT knightrob comparisonofmethodsforestimatingthenucleotidesubstitutionmatrix