Signal and noise in metabarcoding data.
Metabarcoding is a powerful molecular tool for simultaneously surveying hundreds to thousands of species from a single sample, underpinning microbiome and environmental DNA (eDNA) methods. Deriving quantitative estimates of underlying biological communities from metabarcoding is critical for enhanci...
Main Authors: | , , , , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2023-01-01
|
Series: | PLoS ONE |
Online Access: | https://doi.org/10.1371/journal.pone.0285674 |
_version_ | 1797802076514811904 |
---|---|
author | Zachary Gold Andrew Olaf Shelton Helen R Casendino Joe Duprey Ramón Gallego Amy Van Cise Mary Fisher Alexander J Jensen Erin D'Agnese Elizabeth Andruszkiewicz Allan Ana Ramón-Laca Maya Garber-Yonts Michaela Labare Kim M Parsons Ryan P Kelly |
author_facet | Zachary Gold Andrew Olaf Shelton Helen R Casendino Joe Duprey Ramón Gallego Amy Van Cise Mary Fisher Alexander J Jensen Erin D'Agnese Elizabeth Andruszkiewicz Allan Ana Ramón-Laca Maya Garber-Yonts Michaela Labare Kim M Parsons Ryan P Kelly |
author_sort | Zachary Gold |
collection | DOAJ |
description | Metabarcoding is a powerful molecular tool for simultaneously surveying hundreds to thousands of species from a single sample, underpinning microbiome and environmental DNA (eDNA) methods. Deriving quantitative estimates of underlying biological communities from metabarcoding is critical for enhancing the utility of such approaches for health and conservation. Recent work has demonstrated that correcting for amplification biases in genetic metabarcoding data can yield quantitative estimates of template DNA concentrations. However, a major source of uncertainty in metabarcoding data stems from non-detections across technical PCR replicates where one replicate fails to detect a species observed in other replicates. Such non-detections are a special case of variability among technical replicates in metabarcoding data. While many sampling and amplification processes underlie observed variation in metabarcoding data, understanding the causes of non-detections is an important step in distinguishing signal from noise in metabarcoding studies. Here, we use both simulated and empirical data to 1) suggest how non-detections may arise in metabarcoding data, 2) outline steps to recognize uninformative data in practice, and 3) identify the conditions under which amplicon sequence data can reliably detect underlying biological signals. We show with both simulations and empirical data that, for a given species, the rate of non-detections among technical replicates is a function of both the template DNA concentration and species-specific amplification efficiency. Consequently, we conclude metabarcoding datasets are strongly affected by (1) deterministic amplification biases during PCR and (2) stochastic sampling of amplicons during sequencing-both of which we can model-but also by (3) stochastic sampling of rare molecules prior to PCR, which remains a frontier for quantitative metabarcoding. Our results highlight the importance of estimating species-specific amplification efficiencies and critically evaluating patterns of non-detection in metabarcoding datasets to better distinguish environmental signal from the noise inherent in molecular detections of rare targets. |
first_indexed | 2024-03-13T05:00:11Z |
format | Article |
id | doaj.art-e7eb223b90bf42c6a4a1c79cc7ce7db0 |
institution | Directory Open Access Journal |
issn | 1932-6203 |
language | English |
last_indexed | 2024-03-13T05:00:11Z |
publishDate | 2023-01-01 |
publisher | Public Library of Science (PLoS) |
record_format | Article |
series | PLoS ONE |
spelling | doaj.art-e7eb223b90bf42c6a4a1c79cc7ce7db02023-06-17T05:31:30ZengPublic Library of Science (PLoS)PLoS ONE1932-62032023-01-01185e028567410.1371/journal.pone.0285674Signal and noise in metabarcoding data.Zachary GoldAndrew Olaf SheltonHelen R CasendinoJoe DupreyRamón GallegoAmy Van CiseMary FisherAlexander J JensenErin D'AgneseElizabeth Andruszkiewicz AllanAna Ramón-LacaMaya Garber-YontsMichaela LabareKim M ParsonsRyan P KellyMetabarcoding is a powerful molecular tool for simultaneously surveying hundreds to thousands of species from a single sample, underpinning microbiome and environmental DNA (eDNA) methods. Deriving quantitative estimates of underlying biological communities from metabarcoding is critical for enhancing the utility of such approaches for health and conservation. Recent work has demonstrated that correcting for amplification biases in genetic metabarcoding data can yield quantitative estimates of template DNA concentrations. However, a major source of uncertainty in metabarcoding data stems from non-detections across technical PCR replicates where one replicate fails to detect a species observed in other replicates. Such non-detections are a special case of variability among technical replicates in metabarcoding data. While many sampling and amplification processes underlie observed variation in metabarcoding data, understanding the causes of non-detections is an important step in distinguishing signal from noise in metabarcoding studies. Here, we use both simulated and empirical data to 1) suggest how non-detections may arise in metabarcoding data, 2) outline steps to recognize uninformative data in practice, and 3) identify the conditions under which amplicon sequence data can reliably detect underlying biological signals. We show with both simulations and empirical data that, for a given species, the rate of non-detections among technical replicates is a function of both the template DNA concentration and species-specific amplification efficiency. Consequently, we conclude metabarcoding datasets are strongly affected by (1) deterministic amplification biases during PCR and (2) stochastic sampling of amplicons during sequencing-both of which we can model-but also by (3) stochastic sampling of rare molecules prior to PCR, which remains a frontier for quantitative metabarcoding. Our results highlight the importance of estimating species-specific amplification efficiencies and critically evaluating patterns of non-detection in metabarcoding datasets to better distinguish environmental signal from the noise inherent in molecular detections of rare targets.https://doi.org/10.1371/journal.pone.0285674 |
spellingShingle | Zachary Gold Andrew Olaf Shelton Helen R Casendino Joe Duprey Ramón Gallego Amy Van Cise Mary Fisher Alexander J Jensen Erin D'Agnese Elizabeth Andruszkiewicz Allan Ana Ramón-Laca Maya Garber-Yonts Michaela Labare Kim M Parsons Ryan P Kelly Signal and noise in metabarcoding data. PLoS ONE |
title | Signal and noise in metabarcoding data. |
title_full | Signal and noise in metabarcoding data. |
title_fullStr | Signal and noise in metabarcoding data. |
title_full_unstemmed | Signal and noise in metabarcoding data. |
title_short | Signal and noise in metabarcoding data. |
title_sort | signal and noise in metabarcoding data |
url | https://doi.org/10.1371/journal.pone.0285674 |
work_keys_str_mv | AT zacharygold signalandnoiseinmetabarcodingdata AT andrewolafshelton signalandnoiseinmetabarcodingdata AT helenrcasendino signalandnoiseinmetabarcodingdata AT joeduprey signalandnoiseinmetabarcodingdata AT ramongallego signalandnoiseinmetabarcodingdata AT amyvancise signalandnoiseinmetabarcodingdata AT maryfisher signalandnoiseinmetabarcodingdata AT alexanderjjensen signalandnoiseinmetabarcodingdata AT erindagnese signalandnoiseinmetabarcodingdata AT elizabethandruszkiewiczallan signalandnoiseinmetabarcodingdata AT anaramonlaca signalandnoiseinmetabarcodingdata AT mayagarberyonts signalandnoiseinmetabarcodingdata AT michaelalabare signalandnoiseinmetabarcodingdata AT kimmparsons signalandnoiseinmetabarcodingdata AT ryanpkelly signalandnoiseinmetabarcodingdata |