Local alignment of generalized <it>k</it>-base encoded DNA sequence
<p>Abstract</p> <p>Background</p> <p>DNA sequence comparison is a well-studied problem, in which two DNA sequences are compared using a weighted edit distance. Recent DNA sequencing technologies however observe an encoded form of the sequence, rather than each DNA base...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2010-06-01
|
Series: | BMC Bioinformatics |
Online Access: | http://www.biomedcentral.com/1471-2105/11/347 |
_version_ | 1818562426841333760 |
---|---|
author | Nelson Stanley F Homer Nils Merriman Barry |
author_facet | Nelson Stanley F Homer Nils Merriman Barry |
author_sort | Nelson Stanley F |
collection | DOAJ |
description | <p>Abstract</p> <p>Background</p> <p>DNA sequence comparison is a well-studied problem, in which two DNA sequences are compared using a weighted edit distance. Recent DNA sequencing technologies however observe an encoded form of the sequence, rather than each DNA base individually. The encoded DNA sequence may contain technical errors, and therefore encoded sequencing errors must be incorporated when comparing an encoded DNA sequence to a reference DNA sequence.</p> <p>Results</p> <p>Although two-base encoding is currently used in practice, many other encoding schemes are possible, whereby two ore more bases are encoded at a time. A generalized <it>k</it>-base encoding scheme is presented, whereby feasible higher order encodings are better able to differentiate errors in the encoded sequence from true DNA sequence variants. A generalized version of the previous two-base encoding DNA sequence comparison algorithm is used to compare a <it>k</it>-base encoded sequence to a DNA reference sequence. Finally, simulations are performed to evaluate the power, the false positive and false negative SNP discovery rates, and the performance time of <it>k</it>-base encoding compared to previous methods as well as to the standard DNA sequence comparison algorithm.</p> <p>Conclusions</p> <p>The novel generalized <it>k</it>-base encoding scheme and resulting local alignment algorithm permits the development of higher fidelity ligation-based next generation sequencing technology. This bioinformatic solution affords greater robustness to errors, as well as lower false SNP discovery rates, only at the cost of computational time.</p> |
first_indexed | 2024-12-14T01:03:35Z |
format | Article |
id | doaj.art-e330f5b501ea47409b107c8d2703ea56 |
institution | Directory Open Access Journal |
issn | 1471-2105 |
language | English |
last_indexed | 2024-12-14T01:03:35Z |
publishDate | 2010-06-01 |
publisher | BMC |
record_format | Article |
series | BMC Bioinformatics |
spelling | doaj.art-e330f5b501ea47409b107c8d2703ea562022-12-21T23:23:05ZengBMCBMC Bioinformatics1471-21052010-06-0111134710.1186/1471-2105-11-347Local alignment of generalized <it>k</it>-base encoded DNA sequenceNelson Stanley FHomer NilsMerriman Barry<p>Abstract</p> <p>Background</p> <p>DNA sequence comparison is a well-studied problem, in which two DNA sequences are compared using a weighted edit distance. Recent DNA sequencing technologies however observe an encoded form of the sequence, rather than each DNA base individually. The encoded DNA sequence may contain technical errors, and therefore encoded sequencing errors must be incorporated when comparing an encoded DNA sequence to a reference DNA sequence.</p> <p>Results</p> <p>Although two-base encoding is currently used in practice, many other encoding schemes are possible, whereby two ore more bases are encoded at a time. A generalized <it>k</it>-base encoding scheme is presented, whereby feasible higher order encodings are better able to differentiate errors in the encoded sequence from true DNA sequence variants. A generalized version of the previous two-base encoding DNA sequence comparison algorithm is used to compare a <it>k</it>-base encoded sequence to a DNA reference sequence. Finally, simulations are performed to evaluate the power, the false positive and false negative SNP discovery rates, and the performance time of <it>k</it>-base encoding compared to previous methods as well as to the standard DNA sequence comparison algorithm.</p> <p>Conclusions</p> <p>The novel generalized <it>k</it>-base encoding scheme and resulting local alignment algorithm permits the development of higher fidelity ligation-based next generation sequencing technology. This bioinformatic solution affords greater robustness to errors, as well as lower false SNP discovery rates, only at the cost of computational time.</p>http://www.biomedcentral.com/1471-2105/11/347 |
spellingShingle | Nelson Stanley F Homer Nils Merriman Barry Local alignment of generalized <it>k</it>-base encoded DNA sequence BMC Bioinformatics |
title | Local alignment of generalized <it>k</it>-base encoded DNA sequence |
title_full | Local alignment of generalized <it>k</it>-base encoded DNA sequence |
title_fullStr | Local alignment of generalized <it>k</it>-base encoded DNA sequence |
title_full_unstemmed | Local alignment of generalized <it>k</it>-base encoded DNA sequence |
title_short | Local alignment of generalized <it>k</it>-base encoded DNA sequence |
title_sort | local alignment of generalized it k it base encoded dna sequence |
url | http://www.biomedcentral.com/1471-2105/11/347 |
work_keys_str_mv | AT nelsonstanleyf localalignmentofgeneralizeditkitbaseencodeddnasequence AT homernils localalignmentofgeneralizeditkitbaseencodeddnasequence AT merrimanbarry localalignmentofgeneralizeditkitbaseencodeddnasequence |