Variable-order sequence modeling improves bacterial strain discrimination for Ion Torrent DNA reads

Abstract Background Genome sequencing provides a powerful tool for pathogen detection and can help resolve outbreaks that pose public safety and health risks. Mapping of DNA reads to genomes plays a fundamental role in this approach, where accurate alignment and classification of sequencing data is...

Full description

Bibliographic Details
Main Authors: Thomas M. Poulsen, Martin Frith
Format: Article
Language:English
Published: BMC 2017-06-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-017-1710-0
_version_ 1818509441548419072
author Thomas M. Poulsen
Martin Frith
author_facet Thomas M. Poulsen
Martin Frith
author_sort Thomas M. Poulsen
collection DOAJ
description Abstract Background Genome sequencing provides a powerful tool for pathogen detection and can help resolve outbreaks that pose public safety and health risks. Mapping of DNA reads to genomes plays a fundamental role in this approach, where accurate alignment and classification of sequencing data is crucial. Standard mapping methods crudely treat bases as independent from their neighbors. Accuracy might be improved by using higher order paired hidden Markov models (HMMs), which model neighbor effects, but introduce design and implementation issues that have typically made them impractical for read mapping applications. We present a variable-order paired HMM that we term VarHMM, which addresses central issues involved with higher order modeling for sequence alignment. Results Compared with existing alignment methods, VarHMM is able to model higher order distributions and quantify alignment probabilities with greater detail and accuracy. In a series of comparison tests, in which Ion Torrent sequenced DNA was mapped to similar bacterial strains, VarHMM consistently provided better strain discrimination than any of the other alignment methods that we compared with. Conclusions Our results demonstrate the advantages of higher ordered probability distribution modeling and also suggest that further development of such models would benefit read mapping in a range of other applications as well.
first_indexed 2024-12-10T22:45:30Z
format Article
id doaj.art-dc52b5a6284d42afbfb4f4145839f6b9
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-10T22:45:30Z
publishDate 2017-06-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-dc52b5a6284d42afbfb4f4145839f6b92022-12-22T01:30:36ZengBMCBMC Bioinformatics1471-21052017-06-011811910.1186/s12859-017-1710-0Variable-order sequence modeling improves bacterial strain discrimination for Ion Torrent DNA readsThomas M. Poulsen0Martin Frith1Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST)Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST)Abstract Background Genome sequencing provides a powerful tool for pathogen detection and can help resolve outbreaks that pose public safety and health risks. Mapping of DNA reads to genomes plays a fundamental role in this approach, where accurate alignment and classification of sequencing data is crucial. Standard mapping methods crudely treat bases as independent from their neighbors. Accuracy might be improved by using higher order paired hidden Markov models (HMMs), which model neighbor effects, but introduce design and implementation issues that have typically made them impractical for read mapping applications. We present a variable-order paired HMM that we term VarHMM, which addresses central issues involved with higher order modeling for sequence alignment. Results Compared with existing alignment methods, VarHMM is able to model higher order distributions and quantify alignment probabilities with greater detail and accuracy. In a series of comparison tests, in which Ion Torrent sequenced DNA was mapped to similar bacterial strains, VarHMM consistently provided better strain discrimination than any of the other alignment methods that we compared with. Conclusions Our results demonstrate the advantages of higher ordered probability distribution modeling and also suggest that further development of such models would benefit read mapping in a range of other applications as well.http://link.springer.com/article/10.1186/s12859-017-1710-0Sequence alignmentHigher orderHMMIon TorrentPathogen detection
spellingShingle Thomas M. Poulsen
Martin Frith
Variable-order sequence modeling improves bacterial strain discrimination for Ion Torrent DNA reads
BMC Bioinformatics
Sequence alignment
Higher order
HMM
Ion Torrent
Pathogen detection
title Variable-order sequence modeling improves bacterial strain discrimination for Ion Torrent DNA reads
title_full Variable-order sequence modeling improves bacterial strain discrimination for Ion Torrent DNA reads
title_fullStr Variable-order sequence modeling improves bacterial strain discrimination for Ion Torrent DNA reads
title_full_unstemmed Variable-order sequence modeling improves bacterial strain discrimination for Ion Torrent DNA reads
title_short Variable-order sequence modeling improves bacterial strain discrimination for Ion Torrent DNA reads
title_sort variable order sequence modeling improves bacterial strain discrimination for ion torrent dna reads
topic Sequence alignment
Higher order
HMM
Ion Torrent
Pathogen detection
url http://link.springer.com/article/10.1186/s12859-017-1710-0
work_keys_str_mv AT thomasmpoulsen variableordersequencemodelingimprovesbacterialstraindiscriminationforiontorrentdnareads
AT martinfrith variableordersequencemodelingimprovesbacterialstraindiscriminationforiontorrentdnareads