Pan-genomic matching statistics for targeted nanopore sequencing

Summary: Nanopore sequencing is an increasingly powerful tool for genomics. Recently, computational advances have allowed nanopores to sequence in a targeted fashion; as the sequencer emits data, software can analyze the data in real time and signal the sequencer to eject “nontarget” DNA molecules....

Full description

Bibliographic Details
Main Authors: Omar Ahmed, Massimiliano Rossi, Sam Kovaka, Michael C. Schatz, Travis Gagie, Christina Boucher, Ben Langmead
Format: Article
Language:English
Published: Elsevier 2021-06-01
Series:iScience
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2589004221006647
_version_ 1818928278089498624
author Omar Ahmed
Massimiliano Rossi
Sam Kovaka
Michael C. Schatz
Travis Gagie
Christina Boucher
Ben Langmead
author_facet Omar Ahmed
Massimiliano Rossi
Sam Kovaka
Michael C. Schatz
Travis Gagie
Christina Boucher
Ben Langmead
author_sort Omar Ahmed
collection DOAJ
description Summary: Nanopore sequencing is an increasingly powerful tool for genomics. Recently, computational advances have allowed nanopores to sequence in a targeted fashion; as the sequencer emits data, software can analyze the data in real time and signal the sequencer to eject “nontarget” DNA molecules. We present a novel method called SPUMONI, which enables rapid and accurate targeted sequencing using efficient pan-genome indexes. SPUMONI uses a compressed index to rapidly generate exact or approximate matching statistics in a streaming fashion. When used to target a specific strain in a mock community, SPUMONI has similar accuracy as minimap2 when both are run against an index containing many strains per species. However SPUMONI is 12 times faster than minimap2. SPUMONI's index and peak memory footprint are also 16 to 4 times smaller than those of minimap2, respectively. This could enable accurate targeted sequencing even when the targeted strains have not necessarily been sequenced or assembled previously.
first_indexed 2024-12-20T03:26:22Z
format Article
id doaj.art-2b75cd8df2e1468088412279819295cd
institution Directory Open Access Journal
issn 2589-0042
language English
last_indexed 2024-12-20T03:26:22Z
publishDate 2021-06-01
publisher Elsevier
record_format Article
series iScience
spelling doaj.art-2b75cd8df2e1468088412279819295cd2022-12-21T19:55:05ZengElsevieriScience2589-00422021-06-01246102696Pan-genomic matching statistics for targeted nanopore sequencingOmar Ahmed0Massimiliano Rossi1Sam Kovaka2Michael C. Schatz3Travis Gagie4Christina Boucher5Ben Langmead6Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA; Corresponding authorDepartment of Computer and Information Science and Engineering, University of Florida, Gainesville, FL, USADepartment of Computer Science, Johns Hopkins University, Baltimore, MD, USADepartment of Computer Science, Johns Hopkins University, Baltimore, MD, USAFaculty of Computer Science, Dalhousie University, Halifax, NS, USADepartment of Computer and Information Science and Engineering, University of Florida, Gainesville, FL, USADepartment of Computer Science, Johns Hopkins University, Baltimore, MD, USA; Corresponding authorSummary: Nanopore sequencing is an increasingly powerful tool for genomics. Recently, computational advances have allowed nanopores to sequence in a targeted fashion; as the sequencer emits data, software can analyze the data in real time and signal the sequencer to eject “nontarget” DNA molecules. We present a novel method called SPUMONI, which enables rapid and accurate targeted sequencing using efficient pan-genome indexes. SPUMONI uses a compressed index to rapidly generate exact or approximate matching statistics in a streaming fashion. When used to target a specific strain in a mock community, SPUMONI has similar accuracy as minimap2 when both are run against an index containing many strains per species. However SPUMONI is 12 times faster than minimap2. SPUMONI's index and peak memory footprint are also 16 to 4 times smaller than those of minimap2, respectively. This could enable accurate targeted sequencing even when the targeted strains have not necessarily been sequenced or assembled previously.http://www.sciencedirect.com/science/article/pii/S2589004221006647GenomicsBiotechnologyBioinformaticsBiocomputational Method
spellingShingle Omar Ahmed
Massimiliano Rossi
Sam Kovaka
Michael C. Schatz
Travis Gagie
Christina Boucher
Ben Langmead
Pan-genomic matching statistics for targeted nanopore sequencing
iScience
Genomics
Biotechnology
Bioinformatics
Biocomputational Method
title Pan-genomic matching statistics for targeted nanopore sequencing
title_full Pan-genomic matching statistics for targeted nanopore sequencing
title_fullStr Pan-genomic matching statistics for targeted nanopore sequencing
title_full_unstemmed Pan-genomic matching statistics for targeted nanopore sequencing
title_short Pan-genomic matching statistics for targeted nanopore sequencing
title_sort pan genomic matching statistics for targeted nanopore sequencing
topic Genomics
Biotechnology
Bioinformatics
Biocomputational Method
url http://www.sciencedirect.com/science/article/pii/S2589004221006647
work_keys_str_mv AT omarahmed pangenomicmatchingstatisticsfortargetednanoporesequencing
AT massimilianorossi pangenomicmatchingstatisticsfortargetednanoporesequencing
AT samkovaka pangenomicmatchingstatisticsfortargetednanoporesequencing
AT michaelcschatz pangenomicmatchingstatisticsfortargetednanoporesequencing
AT travisgagie pangenomicmatchingstatisticsfortargetednanoporesequencing
AT christinaboucher pangenomicmatchingstatisticsfortargetednanoporesequencing
AT benlangmead pangenomicmatchingstatisticsfortargetednanoporesequencing