Simple Matching Using QIIME 2 and RDP Reveals Misidentified Sequences and an Underrepresentation of Fungi in Reference Datasets

Simple nucleotide matching identification methods are not as accurate as once thought at identifying environmental fungal sequences. This is largely because of incorrect naming and the underrepresentation of various fungal groups in reference datasets. Here, we explore these issues by examining an e...

Full description

Bibliographic Details
Main Authors: Lauren E. Eldred, R. Greg Thorn, David Roy Smith
Format: Article
Language:English
Published: Frontiers Media S.A. 2021-11-01
Series:Frontiers in Genetics
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fgene.2021.768473/full
_version_ 1819039403121573888
author Lauren E. Eldred
R. Greg Thorn
David Roy Smith
author_facet Lauren E. Eldred
R. Greg Thorn
David Roy Smith
author_sort Lauren E. Eldred
collection DOAJ
description Simple nucleotide matching identification methods are not as accurate as once thought at identifying environmental fungal sequences. This is largely because of incorrect naming and the underrepresentation of various fungal groups in reference datasets. Here, we explore these issues by examining an environmental metabarcoding dataset of partial large subunit rRNA sequences of Basidiomycota and basal fungi. We employed the simple matching method using the QIIME 2 classifier and the RDP Classifier in conjunction with the latest releases of the SILVA (138.1, 2020) and RDP (11, 2014) reference datasets and then compared the results with a manual phylogenetic binning approach. Of the 71 query sequences tested, 21 and 42% were misidentified using QIIME 2 and the RDP Classifier, respectively. Of these simple matching misidentifications, more than half resulted from the underrepresentation of various groups of fungi in the SILVA and RDP reference datasets. More comprehensive reference datasets with fewer misidentified sequences will increase the accuracy of simple matching identifications. However, we argue that the phylogenetic binning approach is a better alternative to simple matching since, in addition to better accuracy, it provides evolutionary information about query sequences.
first_indexed 2024-12-21T08:52:39Z
format Article
id doaj.art-002db7b74a6c4e318b298f4ab786579f
institution Directory Open Access Journal
issn 1664-8021
language English
last_indexed 2024-12-21T08:52:39Z
publishDate 2021-11-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Genetics
spelling doaj.art-002db7b74a6c4e318b298f4ab786579f2022-12-21T19:09:37ZengFrontiers Media S.A.Frontiers in Genetics1664-80212021-11-011210.3389/fgene.2021.768473768473Simple Matching Using QIIME 2 and RDP Reveals Misidentified Sequences and an Underrepresentation of Fungi in Reference DatasetsLauren E. EldredR. Greg ThornDavid Roy SmithSimple nucleotide matching identification methods are not as accurate as once thought at identifying environmental fungal sequences. This is largely because of incorrect naming and the underrepresentation of various fungal groups in reference datasets. Here, we explore these issues by examining an environmental metabarcoding dataset of partial large subunit rRNA sequences of Basidiomycota and basal fungi. We employed the simple matching method using the QIIME 2 classifier and the RDP Classifier in conjunction with the latest releases of the SILVA (138.1, 2020) and RDP (11, 2014) reference datasets and then compared the results with a manual phylogenetic binning approach. Of the 71 query sequences tested, 21 and 42% were misidentified using QIIME 2 and the RDP Classifier, respectively. Of these simple matching misidentifications, more than half resulted from the underrepresentation of various groups of fungi in the SILVA and RDP reference datasets. More comprehensive reference datasets with fewer misidentified sequences will increase the accuracy of simple matching identifications. However, we argue that the phylogenetic binning approach is a better alternative to simple matching since, in addition to better accuracy, it provides evolutionary information about query sequences.https://www.frontiersin.org/articles/10.3389/fgene.2021.768473/fullBasidiomycotametabarcodingmisidentificationSILVAsequence identification
spellingShingle Lauren E. Eldred
R. Greg Thorn
David Roy Smith
Simple Matching Using QIIME 2 and RDP Reveals Misidentified Sequences and an Underrepresentation of Fungi in Reference Datasets
Frontiers in Genetics
Basidiomycota
metabarcoding
misidentification
SILVA
sequence identification
title Simple Matching Using QIIME 2 and RDP Reveals Misidentified Sequences and an Underrepresentation of Fungi in Reference Datasets
title_full Simple Matching Using QIIME 2 and RDP Reveals Misidentified Sequences and an Underrepresentation of Fungi in Reference Datasets
title_fullStr Simple Matching Using QIIME 2 and RDP Reveals Misidentified Sequences and an Underrepresentation of Fungi in Reference Datasets
title_full_unstemmed Simple Matching Using QIIME 2 and RDP Reveals Misidentified Sequences and an Underrepresentation of Fungi in Reference Datasets
title_short Simple Matching Using QIIME 2 and RDP Reveals Misidentified Sequences and an Underrepresentation of Fungi in Reference Datasets
title_sort simple matching using qiime 2 and rdp reveals misidentified sequences and an underrepresentation of fungi in reference datasets
topic Basidiomycota
metabarcoding
misidentification
SILVA
sequence identification
url https://www.frontiersin.org/articles/10.3389/fgene.2021.768473/full
work_keys_str_mv AT laureneeldred simplematchingusingqiime2andrdprevealsmisidentifiedsequencesandanunderrepresentationoffungiinreferencedatasets
AT rgregthorn simplematchingusingqiime2andrdprevealsmisidentifiedsequencesandanunderrepresentationoffungiinreferencedatasets
AT davidroysmith simplematchingusingqiime2andrdprevealsmisidentifiedsequencesandanunderrepresentationoffungiinreferencedatasets