Assessing the reliability of medicinal Dendrobium sequences in GenBank for botanical species identification

Abstract DNA-based method is a promising tool in species identification and is widely used in various fields. DNA barcoding method has already been included in different pharmacopoeias for identification of medicinal materials or botanicals. Accuracy and validity of DNA-based methods rely on the acc...

Full description

Bibliographic Details
Main Authors: Hoi-Yan Wu, Kwun-Tin Chan, Grace Wing-Chiu But, Pang-Chui Shaw
Format: Article
Language:English
Published: Nature Portfolio 2021-02-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-021-82385-z
_version_ 1818842734563164160
author Hoi-Yan Wu
Kwun-Tin Chan
Grace Wing-Chiu But
Pang-Chui Shaw
author_facet Hoi-Yan Wu
Kwun-Tin Chan
Grace Wing-Chiu But
Pang-Chui Shaw
author_sort Hoi-Yan Wu
collection DOAJ
description Abstract DNA-based method is a promising tool in species identification and is widely used in various fields. DNA barcoding method has already been included in different pharmacopoeias for identification of medicinal materials or botanicals. Accuracy and validity of DNA-based methods rely on the accuracy and taxonomic reliability of the DNA sequences in the database to be compared against. Here we evaluated the annotation quality and taxonomic reliability of selected barcode loci (rbcL, matK, psbA-trnH, trnL-trnF and ITS) of 41 medicinal Dendrobium species downloaded from GenBank. Annotations of most accessions are incomplete. Only 53.06% of the 2041 accessions downloaded contain a reference to a voucher specimen. Only 31.60% and 4.8% of the entries are annotated with country of origin and collector or assessor, respectively. Taxonomic reliability of the sequences was evaluated by a Megablast search based on similarity to sequences submitted by other research groups. A small number of sequences (211, 7.14%) was regarded as highly doubted. Moreover, 10 out of 60 complete chloroplast genomes contain highly doubted sequences. Our findings suggest that sequences of GenBank should be used with caution for species-level identification. The scientific community should provide more important information regarding identity and traceability of the sample when they deposit sequences to public databases.
first_indexed 2024-12-19T04:46:41Z
format Article
id doaj.art-430fb3aee7ab4e5fb995958210e61e0b
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-12-19T04:46:41Z
publishDate 2021-02-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-430fb3aee7ab4e5fb995958210e61e0b2022-12-21T20:35:27ZengNature PortfolioScientific Reports2045-23222021-02-011111910.1038/s41598-021-82385-zAssessing the reliability of medicinal Dendrobium sequences in GenBank for botanical species identificationHoi-Yan Wu0Kwun-Tin Chan1Grace Wing-Chiu But2Pang-Chui Shaw3Li Dak Sum Yip Yio Chin R&D Centre for Chinese Medicine, The Chinese University of Hong KongLi Dak Sum Yip Yio Chin R&D Centre for Chinese Medicine, The Chinese University of Hong KongSchool of Life Sciences, The Chinese University of Hong KongLi Dak Sum Yip Yio Chin R&D Centre for Chinese Medicine, The Chinese University of Hong KongAbstract DNA-based method is a promising tool in species identification and is widely used in various fields. DNA barcoding method has already been included in different pharmacopoeias for identification of medicinal materials or botanicals. Accuracy and validity of DNA-based methods rely on the accuracy and taxonomic reliability of the DNA sequences in the database to be compared against. Here we evaluated the annotation quality and taxonomic reliability of selected barcode loci (rbcL, matK, psbA-trnH, trnL-trnF and ITS) of 41 medicinal Dendrobium species downloaded from GenBank. Annotations of most accessions are incomplete. Only 53.06% of the 2041 accessions downloaded contain a reference to a voucher specimen. Only 31.60% and 4.8% of the entries are annotated with country of origin and collector or assessor, respectively. Taxonomic reliability of the sequences was evaluated by a Megablast search based on similarity to sequences submitted by other research groups. A small number of sequences (211, 7.14%) was regarded as highly doubted. Moreover, 10 out of 60 complete chloroplast genomes contain highly doubted sequences. Our findings suggest that sequences of GenBank should be used with caution for species-level identification. The scientific community should provide more important information regarding identity and traceability of the sample when they deposit sequences to public databases.https://doi.org/10.1038/s41598-021-82385-z
spellingShingle Hoi-Yan Wu
Kwun-Tin Chan
Grace Wing-Chiu But
Pang-Chui Shaw
Assessing the reliability of medicinal Dendrobium sequences in GenBank for botanical species identification
Scientific Reports
title Assessing the reliability of medicinal Dendrobium sequences in GenBank for botanical species identification
title_full Assessing the reliability of medicinal Dendrobium sequences in GenBank for botanical species identification
title_fullStr Assessing the reliability of medicinal Dendrobium sequences in GenBank for botanical species identification
title_full_unstemmed Assessing the reliability of medicinal Dendrobium sequences in GenBank for botanical species identification
title_short Assessing the reliability of medicinal Dendrobium sequences in GenBank for botanical species identification
title_sort assessing the reliability of medicinal dendrobium sequences in genbank for botanical species identification
url https://doi.org/10.1038/s41598-021-82385-z
work_keys_str_mv AT hoiyanwu assessingthereliabilityofmedicinaldendrobiumsequencesingenbankforbotanicalspeciesidentification
AT kwuntinchan assessingthereliabilityofmedicinaldendrobiumsequencesingenbankforbotanicalspeciesidentification
AT gracewingchiubut assessingthereliabilityofmedicinaldendrobiumsequencesingenbankforbotanicalspeciesidentification
AT pangchuishaw assessingthereliabilityofmedicinaldendrobiumsequencesingenbankforbotanicalspeciesidentification