The importance of data quality for generating reliable distribution models for rare, elusive, and cryptic species.

The availability of spatially referenced environmental data and species occurrence records in online databases enable practitioners to easily generate species distribution models (SDMs) for a broad array of taxa. Such databases often include occurrence records of unknown reliability, yet little info...

Full description

Bibliographic Details
Main Authors: Keith B Aubry, Catherine M Raley, Kevin S McKelvey
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2017-01-01
Series:PLoS ONE
Online Access:https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0179152&type=printable
_version_ 1826579440788832256
author Keith B Aubry
Catherine M Raley
Kevin S McKelvey
author_facet Keith B Aubry
Catherine M Raley
Kevin S McKelvey
author_sort Keith B Aubry
collection DOAJ
description The availability of spatially referenced environmental data and species occurrence records in online databases enable practitioners to easily generate species distribution models (SDMs) for a broad array of taxa. Such databases often include occurrence records of unknown reliability, yet little information is available on the influence of data quality on SDMs generated for rare, elusive, and cryptic species that are prone to misidentification in the field. We investigated this question for the fisher (Pekania pennanti), a forest carnivore of conservation concern in the Pacific States that is often confused with the more common Pacific marten (Martes caurina). Fisher occurrence records supported by physical evidence (verifiable records) were available from a limited area, whereas occurrence records of unknown quality (unscreened records) were available from throughout the fisher's historical range. We reserved 20% of the verifiable records to use as a test sample for both models and generated SDMs with each dataset using Maxent. The verifiable model performed substantially better than the unscreened model based on multiple metrics including AUCtest values (0.78 and 0.62, respectively), evaluation of training and test gains, and statistical tests of how well each model predicted test localities. In addition, the verifiable model was consistent with our knowledge of the fisher's habitat relations and potential distribution, whereas the unscreened model indicated a much broader area of high-quality habitat (indices > 0.5) that included large expanses of high-elevation habitat that fishers do not occupy. Because Pacific martens remain relatively common in upper elevation habitats in the Cascade Range and Sierra Nevada, the SDM based on unscreened records likely reflects primarily a conflation of marten and fisher habitat. Consequently, accurate identifications are far more important than the spatial extent of occurrence records for generating reliable SDMs for the fisher in this region. We strongly recommend that practitioners avoid using anecdotal occurrence records to build SDMs but, if such data are used, the validity of resulting models should be tested with verifiable occurrence records.
first_indexed 2024-12-11T15:49:52Z
format Article
id doaj.art-7f5c1b3b760f49489abc043c3e01127a
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2025-03-14T14:18:19Z
publishDate 2017-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-7f5c1b3b760f49489abc043c3e01127a2025-02-27T05:32:48ZengPublic Library of Science (PLoS)PLoS ONE1932-62032017-01-01126e017915210.1371/journal.pone.0179152The importance of data quality for generating reliable distribution models for rare, elusive, and cryptic species.Keith B AubryCatherine M RaleyKevin S McKelveyThe availability of spatially referenced environmental data and species occurrence records in online databases enable practitioners to easily generate species distribution models (SDMs) for a broad array of taxa. Such databases often include occurrence records of unknown reliability, yet little information is available on the influence of data quality on SDMs generated for rare, elusive, and cryptic species that are prone to misidentification in the field. We investigated this question for the fisher (Pekania pennanti), a forest carnivore of conservation concern in the Pacific States that is often confused with the more common Pacific marten (Martes caurina). Fisher occurrence records supported by physical evidence (verifiable records) were available from a limited area, whereas occurrence records of unknown quality (unscreened records) were available from throughout the fisher's historical range. We reserved 20% of the verifiable records to use as a test sample for both models and generated SDMs with each dataset using Maxent. The verifiable model performed substantially better than the unscreened model based on multiple metrics including AUCtest values (0.78 and 0.62, respectively), evaluation of training and test gains, and statistical tests of how well each model predicted test localities. In addition, the verifiable model was consistent with our knowledge of the fisher's habitat relations and potential distribution, whereas the unscreened model indicated a much broader area of high-quality habitat (indices > 0.5) that included large expanses of high-elevation habitat that fishers do not occupy. Because Pacific martens remain relatively common in upper elevation habitats in the Cascade Range and Sierra Nevada, the SDM based on unscreened records likely reflects primarily a conflation of marten and fisher habitat. Consequently, accurate identifications are far more important than the spatial extent of occurrence records for generating reliable SDMs for the fisher in this region. We strongly recommend that practitioners avoid using anecdotal occurrence records to build SDMs but, if such data are used, the validity of resulting models should be tested with verifiable occurrence records.https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0179152&type=printable
spellingShingle Keith B Aubry
Catherine M Raley
Kevin S McKelvey
The importance of data quality for generating reliable distribution models for rare, elusive, and cryptic species.
PLoS ONE
title The importance of data quality for generating reliable distribution models for rare, elusive, and cryptic species.
title_full The importance of data quality for generating reliable distribution models for rare, elusive, and cryptic species.
title_fullStr The importance of data quality for generating reliable distribution models for rare, elusive, and cryptic species.
title_full_unstemmed The importance of data quality for generating reliable distribution models for rare, elusive, and cryptic species.
title_short The importance of data quality for generating reliable distribution models for rare, elusive, and cryptic species.
title_sort importance of data quality for generating reliable distribution models for rare elusive and cryptic species
url https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0179152&type=printable
work_keys_str_mv AT keithbaubry theimportanceofdataqualityforgeneratingreliabledistributionmodelsforrareelusiveandcrypticspecies
AT catherinemraley theimportanceofdataqualityforgeneratingreliabledistributionmodelsforrareelusiveandcrypticspecies
AT kevinsmckelvey theimportanceofdataqualityforgeneratingreliabledistributionmodelsforrareelusiveandcrypticspecies
AT keithbaubry importanceofdataqualityforgeneratingreliabledistributionmodelsforrareelusiveandcrypticspecies
AT catherinemraley importanceofdataqualityforgeneratingreliabledistributionmodelsforrareelusiveandcrypticspecies
AT kevinsmckelvey importanceofdataqualityforgeneratingreliabledistributionmodelsforrareelusiveandcrypticspecies