NGSpeciesID: DNA barcode and amplicon consensus generation from long‐read sequencing data

Abstract Third‐generation sequencing technologies, such as Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio), have gained popularity over the last years. These platforms can generate millions of long‐read sequences. This is not only advantageous for genome sequencing projects, but...

Full description

Bibliographic Details
Main Authors: Kristoffer Sahlin, Marisa C. W. Lim, Stefan Prost
Format: Article
Language:English
Published: Wiley 2021-02-01
Series:Ecology and Evolution
Subjects:
Online Access:https://doi.org/10.1002/ece3.7146
_version_ 1818640708217602048
author Kristoffer Sahlin
Marisa C. W. Lim
Stefan Prost
author_facet Kristoffer Sahlin
Marisa C. W. Lim
Stefan Prost
author_sort Kristoffer Sahlin
collection DOAJ
description Abstract Third‐generation sequencing technologies, such as Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio), have gained popularity over the last years. These platforms can generate millions of long‐read sequences. This is not only advantageous for genome sequencing projects, but also advantageous for amplicon‐based high‐throughput sequencing experiments, such as DNA barcoding. However, the relatively high error rates associated with these technologies still pose challenges for generating high‐quality consensus sequences. Here, we present NGSpeciesID, a program which can generate highly accurate consensus sequences from long‐read amplicon sequencing technologies, including ONT and PacBio. The tool includes clustering of the reads to help filter out contaminants or reads with high error rates and employs polishing strategies specific to the appropriate sequencing platform. We show that NGSpeciesID produces consensus sequences with improved usability by minimizing preprocessing and software installation and scalability by enabling rapid processing of hundreds to thousands of samples, while maintaining similar consensus accuracy as current pipelines.
first_indexed 2024-12-16T23:15:34Z
format Article
id doaj.art-40fddca195964d17bc4d5f95ff639f37
institution Directory Open Access Journal
issn 2045-7758
language English
last_indexed 2024-12-16T23:15:34Z
publishDate 2021-02-01
publisher Wiley
record_format Article
series Ecology and Evolution
spelling doaj.art-40fddca195964d17bc4d5f95ff639f372022-12-21T22:12:18ZengWileyEcology and Evolution2045-77582021-02-011131392139810.1002/ece3.7146NGSpeciesID: DNA barcode and amplicon consensus generation from long‐read sequencing dataKristoffer Sahlin0Marisa C. W. Lim1Stefan Prost2Department of Mathematics Science for Life Laboratory Stockholm University Stockholm SwedenDepartment of Population Health and Reproduction University of California Davis CA USALOEWE‐Centre for Translational Biodiversity GenomicsSenckenberg Frankfurt GermanyAbstract Third‐generation sequencing technologies, such as Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio), have gained popularity over the last years. These platforms can generate millions of long‐read sequences. This is not only advantageous for genome sequencing projects, but also advantageous for amplicon‐based high‐throughput sequencing experiments, such as DNA barcoding. However, the relatively high error rates associated with these technologies still pose challenges for generating high‐quality consensus sequences. Here, we present NGSpeciesID, a program which can generate highly accurate consensus sequences from long‐read amplicon sequencing technologies, including ONT and PacBio. The tool includes clustering of the reads to help filter out contaminants or reads with high error rates and employs polishing strategies specific to the appropriate sequencing platform. We show that NGSpeciesID produces consensus sequences with improved usability by minimizing preprocessing and software installation and scalability by enabling rapid processing of hundreds to thousands of samples, while maintaining similar consensus accuracy as current pipelines.https://doi.org/10.1002/ece3.7146amplicon sequencingDNA barcodingsequence clusteringthird‐generation sequencing
spellingShingle Kristoffer Sahlin
Marisa C. W. Lim
Stefan Prost
NGSpeciesID: DNA barcode and amplicon consensus generation from long‐read sequencing data
Ecology and Evolution
amplicon sequencing
DNA barcoding
sequence clustering
third‐generation sequencing
title NGSpeciesID: DNA barcode and amplicon consensus generation from long‐read sequencing data
title_full NGSpeciesID: DNA barcode and amplicon consensus generation from long‐read sequencing data
title_fullStr NGSpeciesID: DNA barcode and amplicon consensus generation from long‐read sequencing data
title_full_unstemmed NGSpeciesID: DNA barcode and amplicon consensus generation from long‐read sequencing data
title_short NGSpeciesID: DNA barcode and amplicon consensus generation from long‐read sequencing data
title_sort ngspeciesid dna barcode and amplicon consensus generation from long read sequencing data
topic amplicon sequencing
DNA barcoding
sequence clustering
third‐generation sequencing
url https://doi.org/10.1002/ece3.7146
work_keys_str_mv AT kristoffersahlin ngspeciesiddnabarcodeandampliconconsensusgenerationfromlongreadsequencingdata
AT marisacwlim ngspeciesiddnabarcodeandampliconconsensusgenerationfromlongreadsequencingdata
AT stefanprost ngspeciesiddnabarcodeandampliconconsensusgenerationfromlongreadsequencingdata