Denoising the Denoisers: an independent evaluation of microbiome sequence error-correction approaches

High-depth sequencing of universal marker genes such as the 16S rRNA gene is a common strategy to profile microbial communities. Traditionally, sequence reads are clustered into operational taxonomic units (OTUs) at a defined identity threshold to avoid sequencing errors generating spurious taxonomi...

Full description

Bibliographic Details
Main Authors: Jacob T. Nearing, Gavin M. Douglas, André M. Comeau, Morgan G.I. Langille
Format: Article
Language:English
Published: PeerJ Inc. 2018-08-01
Series:PeerJ
Subjects:
Online Access:https://peerj.com/articles/5364.pdf
_version_ 1797418188865011712
author Jacob T. Nearing
Gavin M. Douglas
André M. Comeau
Morgan G.I. Langille
author_facet Jacob T. Nearing
Gavin M. Douglas
André M. Comeau
Morgan G.I. Langille
author_sort Jacob T. Nearing
collection DOAJ
description High-depth sequencing of universal marker genes such as the 16S rRNA gene is a common strategy to profile microbial communities. Traditionally, sequence reads are clustered into operational taxonomic units (OTUs) at a defined identity threshold to avoid sequencing errors generating spurious taxonomic units. However, there have been numerous bioinformatic packages recently released that attempt to correct sequencing errors to determine real biological sequences at single nucleotide resolution by generating amplicon sequence variants (ASVs). As more researchers begin to use high resolution ASVs, there is a need for an in-depth and unbiased comparison of these novel “denoising” pipelines. In this study, we conduct a thorough comparison of three of the most widely-used denoising packages (DADA2, UNOISE3, and Deblur) as well as an open-reference 97% OTU clustering pipeline on mock, soil, and host-associated communities. We found from the mock community analyses that although they produced similar microbial compositions based on relative abundance, the approaches identified vastly different numbers of ASVs that significantly impact alpha diversity metrics. Our analysis on real datasets using recommended settings for each denoising pipeline also showed that the three packages were consistent in their per-sample compositions, resulting in only minor differences based on weighted UniFrac and Bray–Curtis dissimilarity. DADA2 tended to find more ASVs than the other two denoising pipelines when analyzing both the real soil data and two other host-associated datasets, suggesting that it could be better at finding rare organisms, but at the expense of possible false positives. The open-reference OTU clustering approach identified considerably more OTUs in comparison to the number of ASVs from the denoising pipelines in all datasets tested. The three denoising approaches were significantly different in their run times, with UNOISE3 running greater than 1,200 and 15 times faster than DADA2 and Deblur, respectively. Our findings indicate that, although all pipelines result in similar general community structure, the number of ASVs/OTUs and resulting alpha-diversity metrics varies considerably and should be considered when attempting to identify rare organisms from possible background noise.
first_indexed 2024-03-09T06:28:55Z
format Article
id doaj.art-763bda3ff5db487e877f3e5862a9419c
institution Directory Open Access Journal
issn 2167-8359
language English
last_indexed 2024-03-09T06:28:55Z
publishDate 2018-08-01
publisher PeerJ Inc.
record_format Article
series PeerJ
spelling doaj.art-763bda3ff5db487e877f3e5862a9419c2023-12-03T11:11:07ZengPeerJ Inc.PeerJ2167-83592018-08-016e536410.7717/peerj.5364Denoising the Denoisers: an independent evaluation of microbiome sequence error-correction approachesJacob T. Nearing0Gavin M. Douglas1André M. Comeau2Morgan G.I. Langille3Department of Microbiology and Immunology, Dalhousie University, Halifax, Nova Scotia, CanadaDepartment of Microbiology and Immunology, Dalhousie University, Halifax, Nova Scotia, CanadaIntegrated Microbiome Resource, Dalhousie University, Halifax, Nova Scotia, CanadaDepartment of Microbiology and Immunology, Dalhousie University, Halifax, Nova Scotia, CanadaHigh-depth sequencing of universal marker genes such as the 16S rRNA gene is a common strategy to profile microbial communities. Traditionally, sequence reads are clustered into operational taxonomic units (OTUs) at a defined identity threshold to avoid sequencing errors generating spurious taxonomic units. However, there have been numerous bioinformatic packages recently released that attempt to correct sequencing errors to determine real biological sequences at single nucleotide resolution by generating amplicon sequence variants (ASVs). As more researchers begin to use high resolution ASVs, there is a need for an in-depth and unbiased comparison of these novel “denoising” pipelines. In this study, we conduct a thorough comparison of three of the most widely-used denoising packages (DADA2, UNOISE3, and Deblur) as well as an open-reference 97% OTU clustering pipeline on mock, soil, and host-associated communities. We found from the mock community analyses that although they produced similar microbial compositions based on relative abundance, the approaches identified vastly different numbers of ASVs that significantly impact alpha diversity metrics. Our analysis on real datasets using recommended settings for each denoising pipeline also showed that the three packages were consistent in their per-sample compositions, resulting in only minor differences based on weighted UniFrac and Bray–Curtis dissimilarity. DADA2 tended to find more ASVs than the other two denoising pipelines when analyzing both the real soil data and two other host-associated datasets, suggesting that it could be better at finding rare organisms, but at the expense of possible false positives. The open-reference OTU clustering approach identified considerably more OTUs in comparison to the number of ASVs from the denoising pipelines in all datasets tested. The three denoising approaches were significantly different in their run times, with UNOISE3 running greater than 1,200 and 15 times faster than DADA2 and Deblur, respectively. Our findings indicate that, although all pipelines result in similar general community structure, the number of ASVs/OTUs and resulting alpha-diversity metrics varies considerably and should be considered when attempting to identify rare organisms from possible background noise.https://peerj.com/articles/5364.pdfMicrobiomeDenoising toolsComparisonDADA2DeblurUNOISE3
spellingShingle Jacob T. Nearing
Gavin M. Douglas
André M. Comeau
Morgan G.I. Langille
Denoising the Denoisers: an independent evaluation of microbiome sequence error-correction approaches
PeerJ
Microbiome
Denoising tools
Comparison
DADA2
Deblur
UNOISE3
title Denoising the Denoisers: an independent evaluation of microbiome sequence error-correction approaches
title_full Denoising the Denoisers: an independent evaluation of microbiome sequence error-correction approaches
title_fullStr Denoising the Denoisers: an independent evaluation of microbiome sequence error-correction approaches
title_full_unstemmed Denoising the Denoisers: an independent evaluation of microbiome sequence error-correction approaches
title_short Denoising the Denoisers: an independent evaluation of microbiome sequence error-correction approaches
title_sort denoising the denoisers an independent evaluation of microbiome sequence error correction approaches
topic Microbiome
Denoising tools
Comparison
DADA2
Deblur
UNOISE3
url https://peerj.com/articles/5364.pdf
work_keys_str_mv AT jacobtnearing denoisingthedenoisersanindependentevaluationofmicrobiomesequenceerrorcorrectionapproaches
AT gavinmdouglas denoisingthedenoisersanindependentevaluationofmicrobiomesequenceerrorcorrectionapproaches
AT andremcomeau denoisingthedenoisersanindependentevaluationofmicrobiomesequenceerrorcorrectionapproaches
AT morgangilangille denoisingthedenoisersanindependentevaluationofmicrobiomesequenceerrorcorrectionapproaches