Be positive: customized reference databases and new, local barcodes balance false taxonomic assignments in metabarcoding studies

Background In metabarcoding analyses, the taxonomic assignment is crucial to place sequencing data in biological and ecological contexts. This fundamental step depends on a reference database, which should have a good taxonomic coverage to avoid unassigned sequences. However, this goal is rarely ach...

Full description

Bibliographic Details
Main Authors: Francesco Mugnai, Federica Costantini, Anne Chenuil, Michèle Leduc, José Miguel Gutiérrez Ortega, Emese Meglécz
Format: Article
Language:English
Published: PeerJ Inc. 2023-01-01
Series:PeerJ
Subjects:
Online Access:https://peerj.com/articles/14616.pdf
_version_ 1797420231354744832
author Francesco Mugnai
Federica Costantini
Anne Chenuil
Michèle Leduc
José Miguel Gutiérrez Ortega
Emese Meglécz
author_facet Francesco Mugnai
Federica Costantini
Anne Chenuil
Michèle Leduc
José Miguel Gutiérrez Ortega
Emese Meglécz
author_sort Francesco Mugnai
collection DOAJ
description Background In metabarcoding analyses, the taxonomic assignment is crucial to place sequencing data in biological and ecological contexts. This fundamental step depends on a reference database, which should have a good taxonomic coverage to avoid unassigned sequences. However, this goal is rarely achieved in many geographic regions and for several taxonomic groups. On the other hand, more is not necessarily better, as sequences in reference databases belonging to taxonomic groups out of the studied region/environment context might lead to false assignments. Methods We investigated the effect of using several subsets of a cytochrome c oxidase subunit I (COI) reference database on taxonomic assignment. Published metabarcoding sequences from the Mediterranean Sea were assigned to taxa using COInr, which is a comprehensive, non-redundant and recent database of COI sequences obtained both from BOLD and NCBI, and two of its subsets: (i) all sequences except insects (COInr-WO-Insecta), which represent the overwhelming majority of COInr database, but are irrelevant for marine samples, and (ii) all sequences from taxonomic families present in the Mediterranean Sea (COInr-Med). Four different algorithms for taxonomic assignment were employed in parallel to evaluate differences in their output and data consistency. Results The reduction of the database to more specific custom subsets increased the number of unassigned sequences. Nevertheless, since most of them were incorrectly assigned by the less specific databases, this is a positive outcome. Moreover, the taxonomic resolution (the lowest taxonomic level to which a sequence is attributed) of several sequences tended to increase when using customized databases. These findings clearly indicated the need for customized databases adapted to each study. However, the very high proportion of unassigned sequences points to the need to enrich the local database with new barcodes specifically obtained from the studied region and/or taxonomic group. Including novel local barcodes to the COI database proved to be very profitable: by adding only 116 new barcodes sequenced in our laboratory, thus increasing the reference database by only 0.04%, we were able to improve the resolution for ca. 0.6–1% of the Amplicon Sequence Variants (ASVs).
first_indexed 2024-03-09T06:58:38Z
format Article
id doaj.art-f3606d7d479249478dc4e0e1ec7ed7be
institution Directory Open Access Journal
issn 2167-8359
language English
last_indexed 2024-03-09T06:58:38Z
publishDate 2023-01-01
publisher PeerJ Inc.
record_format Article
series PeerJ
spelling doaj.art-f3606d7d479249478dc4e0e1ec7ed7be2023-12-03T09:59:49ZengPeerJ Inc.PeerJ2167-83592023-01-0111e1461610.7717/peerj.14616Be positive: customized reference databases and new, local barcodes balance false taxonomic assignments in metabarcoding studiesFrancesco Mugnai0Federica Costantini1Anne Chenuil2Michèle Leduc3José Miguel Gutiérrez Ortega4Emese Meglécz5Department of Biological, Geological and Environmental Sciences (BiGeA), University of Bologna, Ravenna, ItalyDepartment of Biological, Geological and Environmental Sciences (BiGeA), University of Bologna, Ravenna, ItalyAix Marseille Univ, Avignon Université, CNRS, IRD, IMBE, Marseille, FranceSTARESO marine station, Calvi, Corse, FranceTAXON Estudios Ambientales S.L., Alcantarilla (Murcia), SpainAix Marseille Univ, Avignon Université, CNRS, IRD, IMBE, Marseille, FranceBackground In metabarcoding analyses, the taxonomic assignment is crucial to place sequencing data in biological and ecological contexts. This fundamental step depends on a reference database, which should have a good taxonomic coverage to avoid unassigned sequences. However, this goal is rarely achieved in many geographic regions and for several taxonomic groups. On the other hand, more is not necessarily better, as sequences in reference databases belonging to taxonomic groups out of the studied region/environment context might lead to false assignments. Methods We investigated the effect of using several subsets of a cytochrome c oxidase subunit I (COI) reference database on taxonomic assignment. Published metabarcoding sequences from the Mediterranean Sea were assigned to taxa using COInr, which is a comprehensive, non-redundant and recent database of COI sequences obtained both from BOLD and NCBI, and two of its subsets: (i) all sequences except insects (COInr-WO-Insecta), which represent the overwhelming majority of COInr database, but are irrelevant for marine samples, and (ii) all sequences from taxonomic families present in the Mediterranean Sea (COInr-Med). Four different algorithms for taxonomic assignment were employed in parallel to evaluate differences in their output and data consistency. Results The reduction of the database to more specific custom subsets increased the number of unassigned sequences. Nevertheless, since most of them were incorrectly assigned by the less specific databases, this is a positive outcome. Moreover, the taxonomic resolution (the lowest taxonomic level to which a sequence is attributed) of several sequences tended to increase when using customized databases. These findings clearly indicated the need for customized databases adapted to each study. However, the very high proportion of unassigned sequences points to the need to enrich the local database with new barcodes specifically obtained from the studied region and/or taxonomic group. Including novel local barcodes to the COI database proved to be very profitable: by adding only 116 new barcodes sequenced in our laboratory, thus increasing the reference database by only 0.04%, we were able to improve the resolution for ca. 0.6–1% of the Amplicon Sequence Variants (ASVs).https://peerj.com/articles/14616.pdfReference databasesMetabarcodingCOITaxonomic assignmentMarine taxaCOInr
spellingShingle Francesco Mugnai
Federica Costantini
Anne Chenuil
Michèle Leduc
José Miguel Gutiérrez Ortega
Emese Meglécz
Be positive: customized reference databases and new, local barcodes balance false taxonomic assignments in metabarcoding studies
PeerJ
Reference databases
Metabarcoding
COI
Taxonomic assignment
Marine taxa
COInr
title Be positive: customized reference databases and new, local barcodes balance false taxonomic assignments in metabarcoding studies
title_full Be positive: customized reference databases and new, local barcodes balance false taxonomic assignments in metabarcoding studies
title_fullStr Be positive: customized reference databases and new, local barcodes balance false taxonomic assignments in metabarcoding studies
title_full_unstemmed Be positive: customized reference databases and new, local barcodes balance false taxonomic assignments in metabarcoding studies
title_short Be positive: customized reference databases and new, local barcodes balance false taxonomic assignments in metabarcoding studies
title_sort be positive customized reference databases and new local barcodes balance false taxonomic assignments in metabarcoding studies
topic Reference databases
Metabarcoding
COI
Taxonomic assignment
Marine taxa
COInr
url https://peerj.com/articles/14616.pdf
work_keys_str_mv AT francescomugnai bepositivecustomizedreferencedatabasesandnewlocalbarcodesbalancefalsetaxonomicassignmentsinmetabarcodingstudies
AT federicacostantini bepositivecustomizedreferencedatabasesandnewlocalbarcodesbalancefalsetaxonomicassignmentsinmetabarcodingstudies
AT annechenuil bepositivecustomizedreferencedatabasesandnewlocalbarcodesbalancefalsetaxonomicassignmentsinmetabarcodingstudies
AT micheleleduc bepositivecustomizedreferencedatabasesandnewlocalbarcodesbalancefalsetaxonomicassignmentsinmetabarcodingstudies
AT josemiguelgutierrezortega bepositivecustomizedreferencedatabasesandnewlocalbarcodesbalancefalsetaxonomicassignmentsinmetabarcodingstudies
AT emesemeglecz bepositivecustomizedreferencedatabasesandnewlocalbarcodesbalancefalsetaxonomicassignmentsinmetabarcodingstudies