lra: A long read aligner for sequences and contigs.
It is computationally challenging to detect variation by aligning single-molecule sequencing (SMS) reads, or contigs from SMS assemblies. One approach to efficiently align SMS reads is sparse dynamic programming (SDP), where optimal chains of exact matches are found between the sequence and the geno...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2021-06-01
|
Series: | PLoS Computational Biology |
Online Access: | https://doi.org/10.1371/journal.pcbi.1009078 |
_version_ | 1797869606738591744 |
---|---|
author | Jingwen Ren Mark J P Chaisson |
author_facet | Jingwen Ren Mark J P Chaisson |
author_sort | Jingwen Ren |
collection | DOAJ |
description | It is computationally challenging to detect variation by aligning single-molecule sequencing (SMS) reads, or contigs from SMS assemblies. One approach to efficiently align SMS reads is sparse dynamic programming (SDP), where optimal chains of exact matches are found between the sequence and the genome. While straightforward implementations of SDP penalize gaps with a cost that is a linear function of gap length, biological variation is more accurately represented when gap cost is a concave function of gap length. We have developed a method, lra, that uses SDP with a concave-cost gap penalty, and used lra to align long-read sequences from PacBio and Oxford Nanopore (ONT) instruments as well as de novo assembly contigs. This alignment approach increases sensitivity and specificity for SV discovery, particularly for variants above 1kb and when discovering variation from ONT reads, while having runtime that are comparable (1.05-3.76×) to current methods. When applied to calling variation from de novo assembly contigs, there is a 3.2% increase in Truvari F1 score compared to minimap2+htsbox. lra is available in bioconda (https://anaconda.org/bioconda/lra) and github (https://github.com/ChaissonLab/LRA). |
first_indexed | 2024-04-10T00:15:17Z |
format | Article |
id | doaj.art-374191bdee9242da9ac6ee49931ee816 |
institution | Directory Open Access Journal |
issn | 1553-734X 1553-7358 |
language | English |
last_indexed | 2024-04-10T00:15:17Z |
publishDate | 2021-06-01 |
publisher | Public Library of Science (PLoS) |
record_format | Article |
series | PLoS Computational Biology |
spelling | doaj.art-374191bdee9242da9ac6ee49931ee8162023-03-16T05:31:12ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582021-06-01176e100907810.1371/journal.pcbi.1009078lra: A long read aligner for sequences and contigs.Jingwen RenMark J P ChaissonIt is computationally challenging to detect variation by aligning single-molecule sequencing (SMS) reads, or contigs from SMS assemblies. One approach to efficiently align SMS reads is sparse dynamic programming (SDP), where optimal chains of exact matches are found between the sequence and the genome. While straightforward implementations of SDP penalize gaps with a cost that is a linear function of gap length, biological variation is more accurately represented when gap cost is a concave function of gap length. We have developed a method, lra, that uses SDP with a concave-cost gap penalty, and used lra to align long-read sequences from PacBio and Oxford Nanopore (ONT) instruments as well as de novo assembly contigs. This alignment approach increases sensitivity and specificity for SV discovery, particularly for variants above 1kb and when discovering variation from ONT reads, while having runtime that are comparable (1.05-3.76×) to current methods. When applied to calling variation from de novo assembly contigs, there is a 3.2% increase in Truvari F1 score compared to minimap2+htsbox. lra is available in bioconda (https://anaconda.org/bioconda/lra) and github (https://github.com/ChaissonLab/LRA).https://doi.org/10.1371/journal.pcbi.1009078 |
spellingShingle | Jingwen Ren Mark J P Chaisson lra: A long read aligner for sequences and contigs. PLoS Computational Biology |
title | lra: A long read aligner for sequences and contigs. |
title_full | lra: A long read aligner for sequences and contigs. |
title_fullStr | lra: A long read aligner for sequences and contigs. |
title_full_unstemmed | lra: A long read aligner for sequences and contigs. |
title_short | lra: A long read aligner for sequences and contigs. |
title_sort | lra a long read aligner for sequences and contigs |
url | https://doi.org/10.1371/journal.pcbi.1009078 |
work_keys_str_mv | AT jingwenren lraalongreadalignerforsequencesandcontigs AT markjpchaisson lraalongreadalignerforsequencesandcontigs |