Hidden Markov models for evolution and comparative genomics analysis.

The problem of reconstruction of ancestral states given a phylogeny and data from extant species arises in a wide range of biological studies. The continuous-time Markov model for the discrete states evolution is generally used for the reconstruction of ancestral states. We modify this model to acco...

Full description

Bibliographic Details
Main Authors: Nadezda A Bykova, Alexander V Favorov, Andrey A Mironov
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2013-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC3676395?pdf=render
_version_ 1818694416919953408
author Nadezda A Bykova
Alexander V Favorov
Andrey A Mironov
author_facet Nadezda A Bykova
Alexander V Favorov
Andrey A Mironov
author_sort Nadezda A Bykova
collection DOAJ
description The problem of reconstruction of ancestral states given a phylogeny and data from extant species arises in a wide range of biological studies. The continuous-time Markov model for the discrete states evolution is generally used for the reconstruction of ancestral states. We modify this model to account for a case when the states of the extant species are uncertain. This situation appears, for example, if the states for extant species are predicted by some program and thus are known only with some level of reliability; it is common for bioinformatics field. The main idea is formulation of the problem as a hidden Markov model on a tree (tree HMM, tHMM), where the basic continuous-time Markov model is expanded with the introduction of emission probabilities of observed data (e.g. prediction scores) for each underlying discrete state. Our tHMM decoding algorithm allows us to predict states at the ancestral nodes as well as to refine states at the leaves on the basis of quantitative comparative genomics. The test on the simulated data shows that the tHMM approach applied to the continuous variable reflecting the probabilities of the states (i.e. prediction score) appears to be more accurate then the reconstruction from the discrete states assignment defined by the best score threshold. We provide examples of applying our model to the evolutionary analysis of N-terminal signal peptides and transcription factor binding sites in bacteria. The program is freely available at http://bioinf.fbb.msu.ru/~nadya/tHMM and via web-service at http://bioinf.fbb.msu.ru/treehmmweb.
first_indexed 2024-12-17T13:29:14Z
format Article
id doaj.art-c00cf3f04c4848ebb1c51494e3e1cfa7
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-12-17T13:29:14Z
publishDate 2013-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-c00cf3f04c4848ebb1c51494e3e1cfa72022-12-21T21:46:39ZengPublic Library of Science (PLoS)PLoS ONE1932-62032013-01-0186e6501210.1371/journal.pone.0065012Hidden Markov models for evolution and comparative genomics analysis.Nadezda A BykovaAlexander V FavorovAndrey A MironovThe problem of reconstruction of ancestral states given a phylogeny and data from extant species arises in a wide range of biological studies. The continuous-time Markov model for the discrete states evolution is generally used for the reconstruction of ancestral states. We modify this model to account for a case when the states of the extant species are uncertain. This situation appears, for example, if the states for extant species are predicted by some program and thus are known only with some level of reliability; it is common for bioinformatics field. The main idea is formulation of the problem as a hidden Markov model on a tree (tree HMM, tHMM), where the basic continuous-time Markov model is expanded with the introduction of emission probabilities of observed data (e.g. prediction scores) for each underlying discrete state. Our tHMM decoding algorithm allows us to predict states at the ancestral nodes as well as to refine states at the leaves on the basis of quantitative comparative genomics. The test on the simulated data shows that the tHMM approach applied to the continuous variable reflecting the probabilities of the states (i.e. prediction score) appears to be more accurate then the reconstruction from the discrete states assignment defined by the best score threshold. We provide examples of applying our model to the evolutionary analysis of N-terminal signal peptides and transcription factor binding sites in bacteria. The program is freely available at http://bioinf.fbb.msu.ru/~nadya/tHMM and via web-service at http://bioinf.fbb.msu.ru/treehmmweb.http://europepmc.org/articles/PMC3676395?pdf=render
spellingShingle Nadezda A Bykova
Alexander V Favorov
Andrey A Mironov
Hidden Markov models for evolution and comparative genomics analysis.
PLoS ONE
title Hidden Markov models for evolution and comparative genomics analysis.
title_full Hidden Markov models for evolution and comparative genomics analysis.
title_fullStr Hidden Markov models for evolution and comparative genomics analysis.
title_full_unstemmed Hidden Markov models for evolution and comparative genomics analysis.
title_short Hidden Markov models for evolution and comparative genomics analysis.
title_sort hidden markov models for evolution and comparative genomics analysis
url http://europepmc.org/articles/PMC3676395?pdf=render
work_keys_str_mv AT nadezdaabykova hiddenmarkovmodelsforevolutionandcomparativegenomicsanalysis
AT alexandervfavorov hiddenmarkovmodelsforevolutionandcomparativegenomicsanalysis
AT andreyamironov hiddenmarkovmodelsforevolutionandcomparativegenomicsanalysis