Multi-amplicon microbiome data analysis pipelines for mixed orientation sequences using QIIME2: Assessing reference database, variable region and pre-processing bias in classification of mock bacterial community samples.

Microbiome research relies on next-generation sequencing and on downstream data analysis workflows. Several manufacturers have introduced multi-amplicon kits for microbiome characterization, improving speciation, but present unique challenges for analysis. The goal of this methodology study was to d...

Full description

Bibliographic Details
Main Authors: Katherine A Maki, Brian Wolff, Leonardo Varuzza, Stefan J Green, Jennifer J Barb
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2023-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0280293
Description
Summary:Microbiome research relies on next-generation sequencing and on downstream data analysis workflows. Several manufacturers have introduced multi-amplicon kits for microbiome characterization, improving speciation, but present unique challenges for analysis. The goal of this methodology study was to develop two analysis pipelines specific to mixed-orientation reads from multi-hypervariable (V) region amplicons. A secondary aim was to assess agreement with expected abundance, considering database and variable region. Mock community sequence data (n = 41) generated using the Ion16S™ Metagenomics Kit and Ion Torrent Sequencing Platform were analyzed using two workflows. Amplicons from V2, V3, V4, V6-7, V8 and V9 were deconvoluted using a specialized plugin based on CutPrimers. A separate workflow using Cutadapt is also presented. Three reference databases (Ribosomal Database Project, Greengenes and Silva) were used for taxonomic assignment. Bray-Curtis, Euclidean and Jensen-Shannon distance measures were used to evaluate overall annotation consistency, and specific taxon agreement was determined by calculating the ratio of observed to expected relative abundance. Reads that mapped to regions V2-V9 varied for both CutPrimers and Cutadapt-based methods. Within the CutPrimers-based pipeline, V3 amplicons had the best agreement with the expected distribution, tested using global distance measures, while V9 amplicons had the worst agreement. Accurate taxonomic annotation varied by genus-level taxon and V region analyzed. For the first time, we present a microbiome analysis pipeline that employs a specialized plugin to allow microbiome researchers to separate multi-amplicon data from the Ion16S Metagenomics Kit into V-specific reads. We also present an additional analysis workflow, modified for Ion Torrent mixed orientation reads. Overall, the global agreement of amplicons with the expected mock community abundances differed across V regions and reference databases. Benchmarking data should be referenced when planning a microbiome study to consider these biases related to sequencing and data analysis for multi-amplicon sequencing kits.
ISSN:1932-6203