Signal and noise in metabarcoding data.

Metabarcoding is a powerful molecular tool for simultaneously surveying hundreds to thousands of species from a single sample, underpinning microbiome and environmental DNA (eDNA) methods. Deriving quantitative estimates of underlying biological communities from metabarcoding is critical for enhanci...

Full description

Bibliographic Details
Main Authors: Zachary Gold, Andrew Olaf Shelton, Helen R Casendino, Joe Duprey, Ramón Gallego, Amy Van Cise, Mary Fisher, Alexander J Jensen, Erin D'Agnese, Elizabeth Andruszkiewicz Allan, Ana Ramón-Laca, Maya Garber-Yonts, Michaela Labare, Kim M Parsons, Ryan P Kelly
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2023-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0285674
_version_ 1797802076514811904
author Zachary Gold
Andrew Olaf Shelton
Helen R Casendino
Joe Duprey
Ramón Gallego
Amy Van Cise
Mary Fisher
Alexander J Jensen
Erin D'Agnese
Elizabeth Andruszkiewicz Allan
Ana Ramón-Laca
Maya Garber-Yonts
Michaela Labare
Kim M Parsons
Ryan P Kelly
author_facet Zachary Gold
Andrew Olaf Shelton
Helen R Casendino
Joe Duprey
Ramón Gallego
Amy Van Cise
Mary Fisher
Alexander J Jensen
Erin D'Agnese
Elizabeth Andruszkiewicz Allan
Ana Ramón-Laca
Maya Garber-Yonts
Michaela Labare
Kim M Parsons
Ryan P Kelly
author_sort Zachary Gold
collection DOAJ
description Metabarcoding is a powerful molecular tool for simultaneously surveying hundreds to thousands of species from a single sample, underpinning microbiome and environmental DNA (eDNA) methods. Deriving quantitative estimates of underlying biological communities from metabarcoding is critical for enhancing the utility of such approaches for health and conservation. Recent work has demonstrated that correcting for amplification biases in genetic metabarcoding data can yield quantitative estimates of template DNA concentrations. However, a major source of uncertainty in metabarcoding data stems from non-detections across technical PCR replicates where one replicate fails to detect a species observed in other replicates. Such non-detections are a special case of variability among technical replicates in metabarcoding data. While many sampling and amplification processes underlie observed variation in metabarcoding data, understanding the causes of non-detections is an important step in distinguishing signal from noise in metabarcoding studies. Here, we use both simulated and empirical data to 1) suggest how non-detections may arise in metabarcoding data, 2) outline steps to recognize uninformative data in practice, and 3) identify the conditions under which amplicon sequence data can reliably detect underlying biological signals. We show with both simulations and empirical data that, for a given species, the rate of non-detections among technical replicates is a function of both the template DNA concentration and species-specific amplification efficiency. Consequently, we conclude metabarcoding datasets are strongly affected by (1) deterministic amplification biases during PCR and (2) stochastic sampling of amplicons during sequencing-both of which we can model-but also by (3) stochastic sampling of rare molecules prior to PCR, which remains a frontier for quantitative metabarcoding. Our results highlight the importance of estimating species-specific amplification efficiencies and critically evaluating patterns of non-detection in metabarcoding datasets to better distinguish environmental signal from the noise inherent in molecular detections of rare targets.
first_indexed 2024-03-13T05:00:11Z
format Article
id doaj.art-e7eb223b90bf42c6a4a1c79cc7ce7db0
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-03-13T05:00:11Z
publishDate 2023-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-e7eb223b90bf42c6a4a1c79cc7ce7db02023-06-17T05:31:30ZengPublic Library of Science (PLoS)PLoS ONE1932-62032023-01-01185e028567410.1371/journal.pone.0285674Signal and noise in metabarcoding data.Zachary GoldAndrew Olaf SheltonHelen R CasendinoJoe DupreyRamón GallegoAmy Van CiseMary FisherAlexander J JensenErin D'AgneseElizabeth Andruszkiewicz AllanAna Ramón-LacaMaya Garber-YontsMichaela LabareKim M ParsonsRyan P KellyMetabarcoding is a powerful molecular tool for simultaneously surveying hundreds to thousands of species from a single sample, underpinning microbiome and environmental DNA (eDNA) methods. Deriving quantitative estimates of underlying biological communities from metabarcoding is critical for enhancing the utility of such approaches for health and conservation. Recent work has demonstrated that correcting for amplification biases in genetic metabarcoding data can yield quantitative estimates of template DNA concentrations. However, a major source of uncertainty in metabarcoding data stems from non-detections across technical PCR replicates where one replicate fails to detect a species observed in other replicates. Such non-detections are a special case of variability among technical replicates in metabarcoding data. While many sampling and amplification processes underlie observed variation in metabarcoding data, understanding the causes of non-detections is an important step in distinguishing signal from noise in metabarcoding studies. Here, we use both simulated and empirical data to 1) suggest how non-detections may arise in metabarcoding data, 2) outline steps to recognize uninformative data in practice, and 3) identify the conditions under which amplicon sequence data can reliably detect underlying biological signals. We show with both simulations and empirical data that, for a given species, the rate of non-detections among technical replicates is a function of both the template DNA concentration and species-specific amplification efficiency. Consequently, we conclude metabarcoding datasets are strongly affected by (1) deterministic amplification biases during PCR and (2) stochastic sampling of amplicons during sequencing-both of which we can model-but also by (3) stochastic sampling of rare molecules prior to PCR, which remains a frontier for quantitative metabarcoding. Our results highlight the importance of estimating species-specific amplification efficiencies and critically evaluating patterns of non-detection in metabarcoding datasets to better distinguish environmental signal from the noise inherent in molecular detections of rare targets.https://doi.org/10.1371/journal.pone.0285674
spellingShingle Zachary Gold
Andrew Olaf Shelton
Helen R Casendino
Joe Duprey
Ramón Gallego
Amy Van Cise
Mary Fisher
Alexander J Jensen
Erin D'Agnese
Elizabeth Andruszkiewicz Allan
Ana Ramón-Laca
Maya Garber-Yonts
Michaela Labare
Kim M Parsons
Ryan P Kelly
Signal and noise in metabarcoding data.
PLoS ONE
title Signal and noise in metabarcoding data.
title_full Signal and noise in metabarcoding data.
title_fullStr Signal and noise in metabarcoding data.
title_full_unstemmed Signal and noise in metabarcoding data.
title_short Signal and noise in metabarcoding data.
title_sort signal and noise in metabarcoding data
url https://doi.org/10.1371/journal.pone.0285674
work_keys_str_mv AT zacharygold signalandnoiseinmetabarcodingdata
AT andrewolafshelton signalandnoiseinmetabarcodingdata
AT helenrcasendino signalandnoiseinmetabarcodingdata
AT joeduprey signalandnoiseinmetabarcodingdata
AT ramongallego signalandnoiseinmetabarcodingdata
AT amyvancise signalandnoiseinmetabarcodingdata
AT maryfisher signalandnoiseinmetabarcodingdata
AT alexanderjjensen signalandnoiseinmetabarcodingdata
AT erindagnese signalandnoiseinmetabarcodingdata
AT elizabethandruszkiewiczallan signalandnoiseinmetabarcodingdata
AT anaramonlaca signalandnoiseinmetabarcodingdata
AT mayagarberyonts signalandnoiseinmetabarcodingdata
AT michaelalabare signalandnoiseinmetabarcodingdata
AT kimmparsons signalandnoiseinmetabarcodingdata
AT ryanpkelly signalandnoiseinmetabarcodingdata