Interpretations of Environmental Microbial Community Studies Are Biased by the Selected 16S rRNA (Gene) Amplicon Sequencing Pipeline

One of the major methods to identify microbial community composition, to unravel microbial population dynamics, and to explore microbial diversity in environmental samples is high-throughput DNA- or RNA-based 16S rRNA (gene) amplicon sequencing in combination with bioinformatics analyses. However, f...

Full description

Bibliographic Details
Main Authors: Daniel Straub, Nia Blackwell, Adrian Langarica-Fuentes, Alexander Peltzer, Sven Nahnsen, Sara Kleindienst
Format: Article
Language:English
Published: Frontiers Media S.A. 2020-10-01
Series:Frontiers in Microbiology
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fmicb.2020.550420/full
_version_ 1818242200406851584
author Daniel Straub
Daniel Straub
Nia Blackwell
Adrian Langarica-Fuentes
Alexander Peltzer
Sven Nahnsen
Sara Kleindienst
author_facet Daniel Straub
Daniel Straub
Nia Blackwell
Adrian Langarica-Fuentes
Alexander Peltzer
Sven Nahnsen
Sara Kleindienst
author_sort Daniel Straub
collection DOAJ
description One of the major methods to identify microbial community composition, to unravel microbial population dynamics, and to explore microbial diversity in environmental samples is high-throughput DNA- or RNA-based 16S rRNA (gene) amplicon sequencing in combination with bioinformatics analyses. However, focusing on environmental samples from contrasting habitats, it was not systematically evaluated (i) which analysis methods provide results that reflect reality most accurately, (ii) how the interpretations of microbial community studies are biased by different analysis methods and (iii) if the most optimal analysis workflow can be implemented in an easy-to-use pipeline. Here, we compared the performance of 16S rRNA (gene) amplicon sequencing analysis tools (i.e., Mothur, QIIME1, QIIME2, and MEGAN) using three mock datasets with known microbial community composition that differed in sequencing quality, species number and abundance distribution (i.e., even or uneven), and phylogenetic diversity (i.e., closely related or well-separated amplicon sequences). Our results showed that QIIME2 outcompeted all other investigated tools in sequence recovery (>10 times fewer false positives), taxonomic assignments (>22% better F-score) and diversity estimates (>5% better assessment), suggesting that this approach is able to reflect the in situ microbial community most accurately. Further analysis of 24 environmental datasets obtained from four contrasting terrestrial and freshwater sites revealed dramatic differences in the resulting microbial community composition for all pipelines at genus level. For instance, at the investigated river water sites Sphaerotilus was only reported when using QIIME1 (8% abundance) and Agitococcus with QIIME1 or QIIME2 (2 or 3% abundance, respectively), but both genera remained undetected when analyzed with Mothur or MEGAN. Since these abundant taxa probably have implications for important biogeochemical cycles (e.g., nitrate and sulfate reduction) at these sites, their detection and semi-quantitative enumeration is crucial for valid interpretations. A high-performance computing conformant workflow was constructed to allow FAIR (Findable, Accessible, Interoperable, and Re-usable) 16S rRNA (gene) amplicon sequence analysis starting from raw sequence files, using the most optimal methods identified in our study. Our presented workflow should be considered for future studies, thereby facilitating the analysis of high-throughput 16S rRNA (gene) sequencing data substantially, while maximizing reliability and confidence in microbial community data analysis.
first_indexed 2024-12-12T13:41:27Z
format Article
id doaj.art-803564f752704ae9ab8da9948947d007
institution Directory Open Access Journal
issn 1664-302X
language English
last_indexed 2024-12-12T13:41:27Z
publishDate 2020-10-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Microbiology
spelling doaj.art-803564f752704ae9ab8da9948947d0072022-12-22T00:22:48ZengFrontiers Media S.A.Frontiers in Microbiology1664-302X2020-10-011110.3389/fmicb.2020.550420550420Interpretations of Environmental Microbial Community Studies Are Biased by the Selected 16S rRNA (Gene) Amplicon Sequencing PipelineDaniel Straub0Daniel Straub1Nia Blackwell2Adrian Langarica-Fuentes3Alexander Peltzer4Sven Nahnsen5Sara Kleindienst6Microbial Ecology, Center for Applied Geoscience, Department of Geosciences, University of Tübingen, Tübingen, GermanyQuantitative Biology Center (QBiC), University of Tübingen, Tübingen, GermanyMicrobial Ecology, Center for Applied Geoscience, Department of Geosciences, University of Tübingen, Tübingen, GermanyMicrobial Ecology, Center for Applied Geoscience, Department of Geosciences, University of Tübingen, Tübingen, GermanyQuantitative Biology Center (QBiC), University of Tübingen, Tübingen, GermanyQuantitative Biology Center (QBiC), University of Tübingen, Tübingen, GermanyMicrobial Ecology, Center for Applied Geoscience, Department of Geosciences, University of Tübingen, Tübingen, GermanyOne of the major methods to identify microbial community composition, to unravel microbial population dynamics, and to explore microbial diversity in environmental samples is high-throughput DNA- or RNA-based 16S rRNA (gene) amplicon sequencing in combination with bioinformatics analyses. However, focusing on environmental samples from contrasting habitats, it was not systematically evaluated (i) which analysis methods provide results that reflect reality most accurately, (ii) how the interpretations of microbial community studies are biased by different analysis methods and (iii) if the most optimal analysis workflow can be implemented in an easy-to-use pipeline. Here, we compared the performance of 16S rRNA (gene) amplicon sequencing analysis tools (i.e., Mothur, QIIME1, QIIME2, and MEGAN) using three mock datasets with known microbial community composition that differed in sequencing quality, species number and abundance distribution (i.e., even or uneven), and phylogenetic diversity (i.e., closely related or well-separated amplicon sequences). Our results showed that QIIME2 outcompeted all other investigated tools in sequence recovery (>10 times fewer false positives), taxonomic assignments (>22% better F-score) and diversity estimates (>5% better assessment), suggesting that this approach is able to reflect the in situ microbial community most accurately. Further analysis of 24 environmental datasets obtained from four contrasting terrestrial and freshwater sites revealed dramatic differences in the resulting microbial community composition for all pipelines at genus level. For instance, at the investigated river water sites Sphaerotilus was only reported when using QIIME1 (8% abundance) and Agitococcus with QIIME1 or QIIME2 (2 or 3% abundance, respectively), but both genera remained undetected when analyzed with Mothur or MEGAN. Since these abundant taxa probably have implications for important biogeochemical cycles (e.g., nitrate and sulfate reduction) at these sites, their detection and semi-quantitative enumeration is crucial for valid interpretations. A high-performance computing conformant workflow was constructed to allow FAIR (Findable, Accessible, Interoperable, and Re-usable) 16S rRNA (gene) amplicon sequence analysis starting from raw sequence files, using the most optimal methods identified in our study. Our presented workflow should be considered for future studies, thereby facilitating the analysis of high-throughput 16S rRNA (gene) sequencing data substantially, while maximizing reliability and confidence in microbial community data analysis.https://www.frontiersin.org/articles/10.3389/fmicb.2020.550420/full16S rRNAamplicon sequencingenvironmental samplesbioinformaticsnf-core/ampliseq
spellingShingle Daniel Straub
Daniel Straub
Nia Blackwell
Adrian Langarica-Fuentes
Alexander Peltzer
Sven Nahnsen
Sara Kleindienst
Interpretations of Environmental Microbial Community Studies Are Biased by the Selected 16S rRNA (Gene) Amplicon Sequencing Pipeline
Frontiers in Microbiology
16S rRNA
amplicon sequencing
environmental samples
bioinformatics
nf-core/ampliseq
title Interpretations of Environmental Microbial Community Studies Are Biased by the Selected 16S rRNA (Gene) Amplicon Sequencing Pipeline
title_full Interpretations of Environmental Microbial Community Studies Are Biased by the Selected 16S rRNA (Gene) Amplicon Sequencing Pipeline
title_fullStr Interpretations of Environmental Microbial Community Studies Are Biased by the Selected 16S rRNA (Gene) Amplicon Sequencing Pipeline
title_full_unstemmed Interpretations of Environmental Microbial Community Studies Are Biased by the Selected 16S rRNA (Gene) Amplicon Sequencing Pipeline
title_short Interpretations of Environmental Microbial Community Studies Are Biased by the Selected 16S rRNA (Gene) Amplicon Sequencing Pipeline
title_sort interpretations of environmental microbial community studies are biased by the selected 16s rrna gene amplicon sequencing pipeline
topic 16S rRNA
amplicon sequencing
environmental samples
bioinformatics
nf-core/ampliseq
url https://www.frontiersin.org/articles/10.3389/fmicb.2020.550420/full
work_keys_str_mv AT danielstraub interpretationsofenvironmentalmicrobialcommunitystudiesarebiasedbytheselected16srrnageneampliconsequencingpipeline
AT danielstraub interpretationsofenvironmentalmicrobialcommunitystudiesarebiasedbytheselected16srrnageneampliconsequencingpipeline
AT niablackwell interpretationsofenvironmentalmicrobialcommunitystudiesarebiasedbytheselected16srrnageneampliconsequencingpipeline
AT adrianlangaricafuentes interpretationsofenvironmentalmicrobialcommunitystudiesarebiasedbytheselected16srrnageneampliconsequencingpipeline
AT alexanderpeltzer interpretationsofenvironmentalmicrobialcommunitystudiesarebiasedbytheselected16srrnageneampliconsequencingpipeline
AT svennahnsen interpretationsofenvironmentalmicrobialcommunitystudiesarebiasedbytheselected16srrnageneampliconsequencingpipeline
AT sarakleindienst interpretationsofenvironmentalmicrobialcommunitystudiesarebiasedbytheselected16srrnageneampliconsequencingpipeline