DNA barcode data accurately assign higher spider taxa

The use of unique DNA sequences as a method for taxonomic identification is no longer fundamentally controversial, even though debate continues on the best markers, methods, and technology to use. Although both existing databanks such as GenBank and BOLD, as well as reference taxonomies, are imperfe...

Full description

Bibliographic Details
Main Authors: Jonathan A. Coddington, Ingi Agnarsson, Ren-Chung Cheng, Klemen Čandek, Amy Driskell, Holger Frick, Matjaž Gregorič, Rok Kostanjšek, Christian Kropf, Matthew Kweskin, Tjaša Lokovšek, Miha Pipan, Nina Vidergar, Matjaž Kuntner
Format: Article
Language:English
Published: PeerJ Inc. 2016-07-01
Series:PeerJ
Subjects:
Online Access:https://peerj.com/articles/2201.pdf
_version_ 1797419289612910592
author Jonathan A. Coddington
Ingi Agnarsson
Ren-Chung Cheng
Klemen Čandek
Amy Driskell
Holger Frick
Matjaž Gregorič
Rok Kostanjšek
Christian Kropf
Matthew Kweskin
Tjaša Lokovšek
Miha Pipan
Nina Vidergar
Matjaž Kuntner
author_facet Jonathan A. Coddington
Ingi Agnarsson
Ren-Chung Cheng
Klemen Čandek
Amy Driskell
Holger Frick
Matjaž Gregorič
Rok Kostanjšek
Christian Kropf
Matthew Kweskin
Tjaša Lokovšek
Miha Pipan
Nina Vidergar
Matjaž Kuntner
author_sort Jonathan A. Coddington
collection DOAJ
description The use of unique DNA sequences as a method for taxonomic identification is no longer fundamentally controversial, even though debate continues on the best markers, methods, and technology to use. Although both existing databanks such as GenBank and BOLD, as well as reference taxonomies, are imperfect, in best case scenarios “barcodes” (whether single or multiple, organelle or nuclear, loci) clearly are an increasingly fast and inexpensive method of identification, especially as compared to manual identification of unknowns by increasingly rare expert taxonomists. Because most species on Earth are undescribed, a complete reference database at the species level is impractical in the near term. The question therefore arises whether unidentified species can, using DNA barcodes, be accurately assigned to more inclusive groups such as genera and families—taxonomic ranks of putatively monophyletic groups for which the global inventory is more complete and stable. We used a carefully chosen test library of CO1 sequences from 49 families, 313 genera, and 816 species of spiders to assess the accuracy of genus and family-level assignment. We used BLAST queries of each sequence against the entire library and got the top ten hits. The percent sequence identity was reported from these hits (PIdent, range 75–100%). Accurate assignment of higher taxa (PIdent above which errors totaled less than 5%) occurred for genera at PIdent values >95 and families at PIdent values ≥ 91, suggesting these as heuristic thresholds for accurate generic and familial identifications in spiders. Accuracy of identification increases with numbers of species/genus and genera/family in the library; above five genera per family and fifteen species per genus all higher taxon assignments were correct. We propose that using percent sequence identity between conventional barcode sequences may be a feasible and reasonably accurate method to identify animals to family/genus. However, the quality of the underlying database impacts accuracy of results; many outliers in our dataset could be attributed to taxonomic and/or sequencing errors in BOLD and GenBank. It seems that an accurate and complete reference library of families and genera of life could provide accurate higher level taxonomic identifications cheaply and accessibly, within years rather than decades.
first_indexed 2024-03-09T06:46:15Z
format Article
id doaj.art-51705e40038641519d49f15ebb76650d
institution Directory Open Access Journal
issn 2167-8359
language English
last_indexed 2024-03-09T06:46:15Z
publishDate 2016-07-01
publisher PeerJ Inc.
record_format Article
series PeerJ
spelling doaj.art-51705e40038641519d49f15ebb76650d2023-12-03T10:35:09ZengPeerJ Inc.PeerJ2167-83592016-07-014e220110.7717/peerj.2201DNA barcode data accurately assign higher spider taxaJonathan A. Coddington0Ingi Agnarsson1Ren-Chung Cheng2Klemen Čandek3Amy Driskell4Holger Frick5Matjaž Gregorič6Rok Kostanjšek7Christian Kropf8Matthew Kweskin9Tjaša Lokovšek10Miha Pipan11Nina Vidergar12Matjaž Kuntner13National Museum of Natural History, Smithsonian Institution, Washington, D.C., United StatesNational Museum of Natural History, Smithsonian Institution, Washington, D.C., United StatesEZ Lab, Institute of Biology, Research Centre of the Slovenian Academy of Sciences and Arts, Ljubljana, SloveniaEZ Lab, Institute of Biology, Research Centre of the Slovenian Academy of Sciences and Arts, Ljubljana, SloveniaNational Museum of Natural History, Smithsonian Institution, Washington, D.C., United StatesDepartment of Invertebrates, Natural History Museum Bern, Bern, SwitzerlandEZ Lab, Institute of Biology, Research Centre of the Slovenian Academy of Sciences and Arts, Ljubljana, SloveniaDepartment of Biology, Biotechnical Faculty, University of Ljubljana, Ljubljana, SloveniaDepartment of Invertebrates, Natural History Museum Bern, Bern, SwitzerlandNational Museum of Natural History, Smithsonian Institution, Washington, D.C., United StatesEZ Lab, Institute of Biology, Research Centre of the Slovenian Academy of Sciences and Arts, Ljubljana, SloveniaEZ Lab, Institute of Biology, Research Centre of the Slovenian Academy of Sciences and Arts, Ljubljana, SloveniaEZ Lab, Institute of Biology, Research Centre of the Slovenian Academy of Sciences and Arts, Ljubljana, SloveniaNational Museum of Natural History, Smithsonian Institution, Washington, D.C., United StatesThe use of unique DNA sequences as a method for taxonomic identification is no longer fundamentally controversial, even though debate continues on the best markers, methods, and technology to use. Although both existing databanks such as GenBank and BOLD, as well as reference taxonomies, are imperfect, in best case scenarios “barcodes” (whether single or multiple, organelle or nuclear, loci) clearly are an increasingly fast and inexpensive method of identification, especially as compared to manual identification of unknowns by increasingly rare expert taxonomists. Because most species on Earth are undescribed, a complete reference database at the species level is impractical in the near term. The question therefore arises whether unidentified species can, using DNA barcodes, be accurately assigned to more inclusive groups such as genera and families—taxonomic ranks of putatively monophyletic groups for which the global inventory is more complete and stable. We used a carefully chosen test library of CO1 sequences from 49 families, 313 genera, and 816 species of spiders to assess the accuracy of genus and family-level assignment. We used BLAST queries of each sequence against the entire library and got the top ten hits. The percent sequence identity was reported from these hits (PIdent, range 75–100%). Accurate assignment of higher taxa (PIdent above which errors totaled less than 5%) occurred for genera at PIdent values >95 and families at PIdent values ≥ 91, suggesting these as heuristic thresholds for accurate generic and familial identifications in spiders. Accuracy of identification increases with numbers of species/genus and genera/family in the library; above five genera per family and fifteen species per genus all higher taxon assignments were correct. We propose that using percent sequence identity between conventional barcode sequences may be a feasible and reasonably accurate method to identify animals to family/genus. However, the quality of the underlying database impacts accuracy of results; many outliers in our dataset could be attributed to taxonomic and/or sequencing errors in BOLD and GenBank. It seems that an accurate and complete reference library of families and genera of life could provide accurate higher level taxonomic identifications cheaply and accessibly, within years rather than decades.https://peerj.com/articles/2201.pdfTaxonomic impedimentFamilyGenusGlobal Genome InitiativeGenomeDNA barcoding
spellingShingle Jonathan A. Coddington
Ingi Agnarsson
Ren-Chung Cheng
Klemen Čandek
Amy Driskell
Holger Frick
Matjaž Gregorič
Rok Kostanjšek
Christian Kropf
Matthew Kweskin
Tjaša Lokovšek
Miha Pipan
Nina Vidergar
Matjaž Kuntner
DNA barcode data accurately assign higher spider taxa
PeerJ
Taxonomic impediment
Family
Genus
Global Genome Initiative
Genome
DNA barcoding
title DNA barcode data accurately assign higher spider taxa
title_full DNA barcode data accurately assign higher spider taxa
title_fullStr DNA barcode data accurately assign higher spider taxa
title_full_unstemmed DNA barcode data accurately assign higher spider taxa
title_short DNA barcode data accurately assign higher spider taxa
title_sort dna barcode data accurately assign higher spider taxa
topic Taxonomic impediment
Family
Genus
Global Genome Initiative
Genome
DNA barcoding
url https://peerj.com/articles/2201.pdf
work_keys_str_mv AT jonathanacoddington dnabarcodedataaccuratelyassignhigherspidertaxa
AT ingiagnarsson dnabarcodedataaccuratelyassignhigherspidertaxa
AT renchungcheng dnabarcodedataaccuratelyassignhigherspidertaxa
AT klemencandek dnabarcodedataaccuratelyassignhigherspidertaxa
AT amydriskell dnabarcodedataaccuratelyassignhigherspidertaxa
AT holgerfrick dnabarcodedataaccuratelyassignhigherspidertaxa
AT matjazgregoric dnabarcodedataaccuratelyassignhigherspidertaxa
AT rokkostanjsek dnabarcodedataaccuratelyassignhigherspidertaxa
AT christiankropf dnabarcodedataaccuratelyassignhigherspidertaxa
AT matthewkweskin dnabarcodedataaccuratelyassignhigherspidertaxa
AT tjasalokovsek dnabarcodedataaccuratelyassignhigherspidertaxa
AT mihapipan dnabarcodedataaccuratelyassignhigherspidertaxa
AT ninavidergar dnabarcodedataaccuratelyassignhigherspidertaxa
AT matjazkuntner dnabarcodedataaccuratelyassignhigherspidertaxa