Pan-genomic matching statistics for targeted nanopore sequencing
Summary: Nanopore sequencing is an increasingly powerful tool for genomics. Recently, computational advances have allowed nanopores to sequence in a targeted fashion; as the sequencer emits data, software can analyze the data in real time and signal the sequencer to eject “nontarget” DNA molecules....
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2021-06-01
|
Series: | iScience |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2589004221006647 |
_version_ | 1818928278089498624 |
---|---|
author | Omar Ahmed Massimiliano Rossi Sam Kovaka Michael C. Schatz Travis Gagie Christina Boucher Ben Langmead |
author_facet | Omar Ahmed Massimiliano Rossi Sam Kovaka Michael C. Schatz Travis Gagie Christina Boucher Ben Langmead |
author_sort | Omar Ahmed |
collection | DOAJ |
description | Summary: Nanopore sequencing is an increasingly powerful tool for genomics. Recently, computational advances have allowed nanopores to sequence in a targeted fashion; as the sequencer emits data, software can analyze the data in real time and signal the sequencer to eject “nontarget” DNA molecules. We present a novel method called SPUMONI, which enables rapid and accurate targeted sequencing using efficient pan-genome indexes. SPUMONI uses a compressed index to rapidly generate exact or approximate matching statistics in a streaming fashion. When used to target a specific strain in a mock community, SPUMONI has similar accuracy as minimap2 when both are run against an index containing many strains per species. However SPUMONI is 12 times faster than minimap2. SPUMONI's index and peak memory footprint are also 16 to 4 times smaller than those of minimap2, respectively. This could enable accurate targeted sequencing even when the targeted strains have not necessarily been sequenced or assembled previously. |
first_indexed | 2024-12-20T03:26:22Z |
format | Article |
id | doaj.art-2b75cd8df2e1468088412279819295cd |
institution | Directory Open Access Journal |
issn | 2589-0042 |
language | English |
last_indexed | 2024-12-20T03:26:22Z |
publishDate | 2021-06-01 |
publisher | Elsevier |
record_format | Article |
series | iScience |
spelling | doaj.art-2b75cd8df2e1468088412279819295cd2022-12-21T19:55:05ZengElsevieriScience2589-00422021-06-01246102696Pan-genomic matching statistics for targeted nanopore sequencingOmar Ahmed0Massimiliano Rossi1Sam Kovaka2Michael C. Schatz3Travis Gagie4Christina Boucher5Ben Langmead6Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA; Corresponding authorDepartment of Computer and Information Science and Engineering, University of Florida, Gainesville, FL, USADepartment of Computer Science, Johns Hopkins University, Baltimore, MD, USADepartment of Computer Science, Johns Hopkins University, Baltimore, MD, USAFaculty of Computer Science, Dalhousie University, Halifax, NS, USADepartment of Computer and Information Science and Engineering, University of Florida, Gainesville, FL, USADepartment of Computer Science, Johns Hopkins University, Baltimore, MD, USA; Corresponding authorSummary: Nanopore sequencing is an increasingly powerful tool for genomics. Recently, computational advances have allowed nanopores to sequence in a targeted fashion; as the sequencer emits data, software can analyze the data in real time and signal the sequencer to eject “nontarget” DNA molecules. We present a novel method called SPUMONI, which enables rapid and accurate targeted sequencing using efficient pan-genome indexes. SPUMONI uses a compressed index to rapidly generate exact or approximate matching statistics in a streaming fashion. When used to target a specific strain in a mock community, SPUMONI has similar accuracy as minimap2 when both are run against an index containing many strains per species. However SPUMONI is 12 times faster than minimap2. SPUMONI's index and peak memory footprint are also 16 to 4 times smaller than those of minimap2, respectively. This could enable accurate targeted sequencing even when the targeted strains have not necessarily been sequenced or assembled previously.http://www.sciencedirect.com/science/article/pii/S2589004221006647GenomicsBiotechnologyBioinformaticsBiocomputational Method |
spellingShingle | Omar Ahmed Massimiliano Rossi Sam Kovaka Michael C. Schatz Travis Gagie Christina Boucher Ben Langmead Pan-genomic matching statistics for targeted nanopore sequencing iScience Genomics Biotechnology Bioinformatics Biocomputational Method |
title | Pan-genomic matching statistics for targeted nanopore sequencing |
title_full | Pan-genomic matching statistics for targeted nanopore sequencing |
title_fullStr | Pan-genomic matching statistics for targeted nanopore sequencing |
title_full_unstemmed | Pan-genomic matching statistics for targeted nanopore sequencing |
title_short | Pan-genomic matching statistics for targeted nanopore sequencing |
title_sort | pan genomic matching statistics for targeted nanopore sequencing |
topic | Genomics Biotechnology Bioinformatics Biocomputational Method |
url | http://www.sciencedirect.com/science/article/pii/S2589004221006647 |
work_keys_str_mv | AT omarahmed pangenomicmatchingstatisticsfortargetednanoporesequencing AT massimilianorossi pangenomicmatchingstatisticsfortargetednanoporesequencing AT samkovaka pangenomicmatchingstatisticsfortargetednanoporesequencing AT michaelcschatz pangenomicmatchingstatisticsfortargetednanoporesequencing AT travisgagie pangenomicmatchingstatisticsfortargetednanoporesequencing AT christinaboucher pangenomicmatchingstatisticsfortargetednanoporesequencing AT benlangmead pangenomicmatchingstatisticsfortargetednanoporesequencing |