Local alignment of generalized <it>k</it>-base encoded DNA sequence

<p>Abstract</p> <p>Background</p> <p>DNA sequence comparison is a well-studied problem, in which two DNA sequences are compared using a weighted edit distance. Recent DNA sequencing technologies however observe an encoded form of the sequence, rather than each DNA base...

Full description

Bibliographic Details
Main Authors: Nelson Stanley F, Homer Nils, Merriman Barry
Format: Article
Language:English
Published: BMC 2010-06-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/11/347
_version_ 1818562426841333760
author Nelson Stanley F
Homer Nils
Merriman Barry
author_facet Nelson Stanley F
Homer Nils
Merriman Barry
author_sort Nelson Stanley F
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>DNA sequence comparison is a well-studied problem, in which two DNA sequences are compared using a weighted edit distance. Recent DNA sequencing technologies however observe an encoded form of the sequence, rather than each DNA base individually. The encoded DNA sequence may contain technical errors, and therefore encoded sequencing errors must be incorporated when comparing an encoded DNA sequence to a reference DNA sequence.</p> <p>Results</p> <p>Although two-base encoding is currently used in practice, many other encoding schemes are possible, whereby two ore more bases are encoded at a time. A generalized <it>k</it>-base encoding scheme is presented, whereby feasible higher order encodings are better able to differentiate errors in the encoded sequence from true DNA sequence variants. A generalized version of the previous two-base encoding DNA sequence comparison algorithm is used to compare a <it>k</it>-base encoded sequence to a DNA reference sequence. Finally, simulations are performed to evaluate the power, the false positive and false negative SNP discovery rates, and the performance time of <it>k</it>-base encoding compared to previous methods as well as to the standard DNA sequence comparison algorithm.</p> <p>Conclusions</p> <p>The novel generalized <it>k</it>-base encoding scheme and resulting local alignment algorithm permits the development of higher fidelity ligation-based next generation sequencing technology. This bioinformatic solution affords greater robustness to errors, as well as lower false SNP discovery rates, only at the cost of computational time.</p>
first_indexed 2024-12-14T01:03:35Z
format Article
id doaj.art-e330f5b501ea47409b107c8d2703ea56
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-14T01:03:35Z
publishDate 2010-06-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-e330f5b501ea47409b107c8d2703ea562022-12-21T23:23:05ZengBMCBMC Bioinformatics1471-21052010-06-0111134710.1186/1471-2105-11-347Local alignment of generalized <it>k</it>-base encoded DNA sequenceNelson Stanley FHomer NilsMerriman Barry<p>Abstract</p> <p>Background</p> <p>DNA sequence comparison is a well-studied problem, in which two DNA sequences are compared using a weighted edit distance. Recent DNA sequencing technologies however observe an encoded form of the sequence, rather than each DNA base individually. The encoded DNA sequence may contain technical errors, and therefore encoded sequencing errors must be incorporated when comparing an encoded DNA sequence to a reference DNA sequence.</p> <p>Results</p> <p>Although two-base encoding is currently used in practice, many other encoding schemes are possible, whereby two ore more bases are encoded at a time. A generalized <it>k</it>-base encoding scheme is presented, whereby feasible higher order encodings are better able to differentiate errors in the encoded sequence from true DNA sequence variants. A generalized version of the previous two-base encoding DNA sequence comparison algorithm is used to compare a <it>k</it>-base encoded sequence to a DNA reference sequence. Finally, simulations are performed to evaluate the power, the false positive and false negative SNP discovery rates, and the performance time of <it>k</it>-base encoding compared to previous methods as well as to the standard DNA sequence comparison algorithm.</p> <p>Conclusions</p> <p>The novel generalized <it>k</it>-base encoding scheme and resulting local alignment algorithm permits the development of higher fidelity ligation-based next generation sequencing technology. This bioinformatic solution affords greater robustness to errors, as well as lower false SNP discovery rates, only at the cost of computational time.</p>http://www.biomedcentral.com/1471-2105/11/347
spellingShingle Nelson Stanley F
Homer Nils
Merriman Barry
Local alignment of generalized <it>k</it>-base encoded DNA sequence
BMC Bioinformatics
title Local alignment of generalized <it>k</it>-base encoded DNA sequence
title_full Local alignment of generalized <it>k</it>-base encoded DNA sequence
title_fullStr Local alignment of generalized <it>k</it>-base encoded DNA sequence
title_full_unstemmed Local alignment of generalized <it>k</it>-base encoded DNA sequence
title_short Local alignment of generalized <it>k</it>-base encoded DNA sequence
title_sort local alignment of generalized it k it base encoded dna sequence
url http://www.biomedcentral.com/1471-2105/11/347
work_keys_str_mv AT nelsonstanleyf localalignmentofgeneralizeditkitbaseencodeddnasequence
AT homernils localalignmentofgeneralizeditkitbaseencodeddnasequence
AT merrimanbarry localalignmentofgeneralizeditkitbaseencodeddnasequence