MIDORI2: A collection of quality controlled, preformatted, and regularly updated reference databases for taxonomic assignment of eukaryotic mitochondrial sequences

Abstract Analysis of environmental DNA is increasingly used to characterize ecological communities, but the effectiveness of this approach depends on the accuracy of taxonomic reference databases. The MIDORI databases, first released in 2017, were built to improve accuracy for mitochondrial metazoan...

Full description

Bibliographic Details
Main Authors: Matthieu Leray, Nancy Knowlton, Ryuji J. Machida
Format: Article
Language:English
Published: Wiley 2022-07-01
Series:Environmental DNA
Subjects:
Online Access:https://doi.org/10.1002/edn3.303
_version_ 1818491482046201856
author Matthieu Leray
Nancy Knowlton
Ryuji J. Machida
author_facet Matthieu Leray
Nancy Knowlton
Ryuji J. Machida
author_sort Matthieu Leray
collection DOAJ
description Abstract Analysis of environmental DNA is increasingly used to characterize ecological communities, but the effectiveness of this approach depends on the accuracy of taxonomic reference databases. The MIDORI databases, first released in 2017, were built to improve accuracy for mitochondrial metazoan (animal) sequences. MIDORI has now been significantly improved and renamed MIDORI2 (available at http://www.reference‐midori.info). Like MIDORI, MIDORI2 is built from GenBank and contains curated sequences of thirteen protein‐coding and two ribosomal RNA mitochondrial genes. Coverage has been substantially expanded to cover all eukaryotes, including fungi, green algae and land plants, other multicellular algal groups, and diverse protist lineages. MIDORI2 also now includes not only species with full binomials, but also taxa referred to by genus with species left unspecified (“sp.”). Another new feature is the updating of the databases approximately every two months with version numbers corresponding to each new GenBank release. Additional potentially erroneously annotated sequences have also been removed. Finally, the ability to export data files to BLAST+ has been added to the original ability to export preformatted data to five taxonomic assignment programs, and databases of amino acid sequences are also made available for protein‐coding genes. As a technical validation, we conducted a preliminary comparison of the performance of MIDORI2 with five taxonomic assignment programs. Results suggest that BLAST+ top hits performed better for assigning CO1 sequences than alignment‐free methods based on compositional features. Comparing MIDORI2 with two other commonly used curated databases of mitochondrial sequences, CO‐ARBitrator and BOLD, we show that MIDORI2 includes sequences from a broader range of metazoan and non‐metazoan taxa. Overall, in many contexts, MIDORI2 offers clear advantages: a higher diversity of taxa than other databases, a variety of user‐friendly features, and regular updates. MIDORI2 is particularly well‐suited for environmental DNA studies that target mitochondrial genes with broad primers.
first_indexed 2024-12-10T17:31:24Z
format Article
id doaj.art-1ea80ec22ceb4d449cd2aaedbb390424
institution Directory Open Access Journal
issn 2637-4943
language English
last_indexed 2024-12-10T17:31:24Z
publishDate 2022-07-01
publisher Wiley
record_format Article
series Environmental DNA
spelling doaj.art-1ea80ec22ceb4d449cd2aaedbb3904242022-12-22T01:39:40ZengWileyEnvironmental DNA2637-49432022-07-014489490710.1002/edn3.303MIDORI2: A collection of quality controlled, preformatted, and regularly updated reference databases for taxonomic assignment of eukaryotic mitochondrial sequencesMatthieu Leray0Nancy Knowlton1Ryuji J. Machida2Smithsonian Tropical Research Institute Balboa Ancon PanamaNational Museum of Natural History Smithsonian Institution Washington District of Columbia USABiodiversity Research Centre Academia Sinica Taipei TaiwanAbstract Analysis of environmental DNA is increasingly used to characterize ecological communities, but the effectiveness of this approach depends on the accuracy of taxonomic reference databases. The MIDORI databases, first released in 2017, were built to improve accuracy for mitochondrial metazoan (animal) sequences. MIDORI has now been significantly improved and renamed MIDORI2 (available at http://www.reference‐midori.info). Like MIDORI, MIDORI2 is built from GenBank and contains curated sequences of thirteen protein‐coding and two ribosomal RNA mitochondrial genes. Coverage has been substantially expanded to cover all eukaryotes, including fungi, green algae and land plants, other multicellular algal groups, and diverse protist lineages. MIDORI2 also now includes not only species with full binomials, but also taxa referred to by genus with species left unspecified (“sp.”). Another new feature is the updating of the databases approximately every two months with version numbers corresponding to each new GenBank release. Additional potentially erroneously annotated sequences have also been removed. Finally, the ability to export data files to BLAST+ has been added to the original ability to export preformatted data to five taxonomic assignment programs, and databases of amino acid sequences are also made available for protein‐coding genes. As a technical validation, we conducted a preliminary comparison of the performance of MIDORI2 with five taxonomic assignment programs. Results suggest that BLAST+ top hits performed better for assigning CO1 sequences than alignment‐free methods based on compositional features. Comparing MIDORI2 with two other commonly used curated databases of mitochondrial sequences, CO‐ARBitrator and BOLD, we show that MIDORI2 includes sequences from a broader range of metazoan and non‐metazoan taxa. Overall, in many contexts, MIDORI2 offers clear advantages: a higher diversity of taxa than other databases, a variety of user‐friendly features, and regular updates. MIDORI2 is particularly well‐suited for environmental DNA studies that target mitochondrial genes with broad primers.https://doi.org/10.1002/edn3.303eukaryotefungiGenBankmetabarcodingmetazoanmitochondrial genes
spellingShingle Matthieu Leray
Nancy Knowlton
Ryuji J. Machida
MIDORI2: A collection of quality controlled, preformatted, and regularly updated reference databases for taxonomic assignment of eukaryotic mitochondrial sequences
Environmental DNA
eukaryote
fungi
GenBank
metabarcoding
metazoan
mitochondrial genes
title MIDORI2: A collection of quality controlled, preformatted, and regularly updated reference databases for taxonomic assignment of eukaryotic mitochondrial sequences
title_full MIDORI2: A collection of quality controlled, preformatted, and regularly updated reference databases for taxonomic assignment of eukaryotic mitochondrial sequences
title_fullStr MIDORI2: A collection of quality controlled, preformatted, and regularly updated reference databases for taxonomic assignment of eukaryotic mitochondrial sequences
title_full_unstemmed MIDORI2: A collection of quality controlled, preformatted, and regularly updated reference databases for taxonomic assignment of eukaryotic mitochondrial sequences
title_short MIDORI2: A collection of quality controlled, preformatted, and regularly updated reference databases for taxonomic assignment of eukaryotic mitochondrial sequences
title_sort midori2 a collection of quality controlled preformatted and regularly updated reference databases for taxonomic assignment of eukaryotic mitochondrial sequences
topic eukaryote
fungi
GenBank
metabarcoding
metazoan
mitochondrial genes
url https://doi.org/10.1002/edn3.303
work_keys_str_mv AT matthieuleray midori2acollectionofqualitycontrolledpreformattedandregularlyupdatedreferencedatabasesfortaxonomicassignmentofeukaryoticmitochondrialsequences
AT nancyknowlton midori2acollectionofqualitycontrolledpreformattedandregularlyupdatedreferencedatabasesfortaxonomicassignmentofeukaryoticmitochondrialsequences
AT ryujijmachida midori2acollectionofqualitycontrolledpreformattedandregularlyupdatedreferencedatabasesfortaxonomicassignmentofeukaryoticmitochondrialsequences