Generalizations of Markov model to characterize biological sequences

<p>Abstract</p> <p>Background</p> <p>The currently used <it>k</it><sup><it>th </it></sup>order Markov models estimate the probability of generating a <it>single </it>nucleotide conditional upon the immediately preceding (&...

Full description

Bibliographic Details
Main Authors: Hannenhalli Sridhar, Wang Junwen
Format: Article
Language:English
Published: BMC 2005-09-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/6/219
Description
Summary:<p>Abstract</p> <p>Background</p> <p>The currently used <it>k</it><sup><it>th </it></sup>order Markov models estimate the probability of generating a <it>single </it>nucleotide conditional upon the immediately preceding (<it>gap </it>= 0) <it>k </it>units. However, this neither takes into account the joint dependency of <it>multiple </it>neighboring nucleotides, nor does it consider the long range dependency with <it>gap</it>>0.</p> <p>Result</p> <p>We describe a configurable tool to explore generalizations of the standard Markov model. We evaluated whether the sequence classification accuracy can be improved by using an alternative set of model parameters. The evaluation was done on four classes of biological sequences – CpG-poor promoters, all promoters, exons and nucleosome positioning sequences. Using di- and tri-nucleotide as the model unit significantly improved the sequence classification accuracy relative to the standard single nucleotide model. In the case of nucleosome positioning sequences, optimal accuracy was achieved at a <it>gap </it>length of 4. Furthermore in the plot of classification accuracy versus the gap, a periodicity of 10–11 bps was observed which might indicate structural preferences in the nucleosome positioning sequence. The tool is implemented in Java and is available for download at <url>ftp://ftp.pcbi.upenn.edu/GMM/</url>.</p> <p>Conclusion</p> <p>Markov modeling is an important component of many sequence analysis tools. We have extended the standard Markov model to incorporate joint and long range dependencies between the sequence elements. The proposed generalizations of the Markov model are likely to improve the overall accuracy of sequence analysis tools.</p>
ISSN:1471-2105