Assessing the effect of sample bias correction in species distribution models

1. Open-source biodiversity databases contain a large number of species occurrence records but are often spatially biased; which affects the reliability of species distribution models based on these records. Sample bias correction techniques require data filtering which comes at the cost of record n...

Full description

Bibliographic Details
Main Authors:	Nicolas Dubos, Clémentine Préau, Maxime Lenormand, Guillaume Papuga, Sophie Monsarrat, Pierre Denelle, Marine Le Louarn, Stien Heremans, Roel May, Philip Roche, Sandra Luque
Format:	Article
Language:	English
Published:	Elsevier 2022-12-01
Series:	Ecological Indicators
Subjects:	Accessibility maps Cross-validation Performance metrics Overlap Pseudo-absence selection Terrestrial vertebrates
Online Access:	http://www.sciencedirect.com/science/article/pii/S1470160X22009608

_version_	1811190569673687040
author	Nicolas Dubos Clémentine Préau Maxime Lenormand Guillaume Papuga Sophie Monsarrat Pierre Denelle Marine Le Louarn Stien Heremans Roel May Philip Roche Sandra Luque
author_facet	Nicolas Dubos Clémentine Préau Maxime Lenormand Guillaume Papuga Sophie Monsarrat Pierre Denelle Marine Le Louarn Stien Heremans Roel May Philip Roche Sandra Luque
author_sort	Nicolas Dubos
collection	DOAJ
description	1. Open-source biodiversity databases contain a large number of species occurrence records but are often spatially biased; which affects the reliability of species distribution models based on these records. Sample bias correction techniques require data filtering which comes at the cost of record numbers, or require considerable additional sampling effort. Since independent data is rarely available, assessment of the correction technique often relies solely on performance metrics computed using subsets of the available – biased – data, which may prove misleading.2. Here, we assess the extent to which an acknowledged sample bias correction technique is likely to improve models’ ability to predict species distributions in the absence of independent data. We assessed variation in model predictions induced by the aforementioned correction and model stochasticity; the variability between model replicates related to a random component (pseudo-absences sets and cross-validation subsets). We present, then, an index of the effect of correction relative to model stochasticity; the Relative Overlap Index (ROI). We investigated whether the ROI better represented the effect of correction than classic performance metrics (Boyce index, cAUC, AUC and TSS) and absolute overlap metrics (Schoener’s D, Pearson’s and Spearman’s correlation coefficients) when considering data related to 64 vertebrate species and 21 virtual species with a generated sample bias.3. When based on absolute overlaps and cross-validation performance metrics, we found that correction produced no significant effects. When considering its effect relative to model stochasticity, the effect of correction was strong for most species at one of the three sites. The use of virtual species enabled us to verify that the correction technique improved both distribution predictions and the biological relevance of the selected variables at the specific site, when these were not correlated with sample bias patterns.4. In the absence of additional independent data, the assessment of sample bias correction based on subsample data may be misleading. We propose to investigate both the biological relevance of environmental variables selected, and, the effect of sample bias correction based on its effect relative to model stochasticity.
first_indexed	2024-04-11T14:52:05Z
format	Article
id	doaj.art-c13052460834401099160ecbd262d9f8
institution	Directory Open Access Journal
issn	1470-160X
language	English
last_indexed	2024-04-11T14:52:05Z
publishDate	2022-12-01
publisher	Elsevier
record_format	Article
series	Ecological Indicators
spelling	doaj.art-c13052460834401099160ecbd262d9f82022-12-22T04:17:23ZengElsevierEcological Indicators1470-160X2022-12-01145109487Assessing the effect of sample bias correction in species distribution modelsNicolas Dubos0Clémentine Préau1Maxime Lenormand2Guillaume Papuga3Sophie Monsarrat4Pierre Denelle5Marine Le Louarn6Stien Heremans7Roel May8Philip Roche9Sandra Luque10INRAE, National Research Institute on Agriculture, Food & the Environment, TETIS Unit, Montpellier, France; Corresponding author.INRAE, National Research Institute on Agriculture, Food & the Environment, TETIS Unit, Montpellier, FranceINRAE, National Research Institute on Agriculture, Food & the Environment, TETIS Unit, Montpellier, France; Corresponding author.AMAP, Univ Montpellier, CIRAD, CNRS, INRAE, IRD, Montpellier, FranceCenter for Biodiversity Dynamics in a Changing World (BIOCHANGE), Department of Biology, Aarhus University, Ny Munkegade 114, DK-8000 Aarhus C, Denmark; Section for Ecoinformatics and Biodiversity, Department of Biology, Aarhus University, Ny Munkegade 114, DK-8000 Aarhus C, DenmarkINRAE, National Research Institute on Agriculture, Food & the Environment, TETIS Unit, Montpellier, France; Biodiversity, Macroecology & Biogeography, University of Göettingen, Göttingen, GermanyINRAE, National Research Institute on Agriculture, Food & the Environment, TETIS Unit, Montpellier, FranceResearch Institute for Nature and Forest (INBO), Brussels, BelgiumNorwegian Institute for Nature Research (NINA), P.O. Box 5685 Torgarden, NO-7485 Trondheim, NorwayINRAE, Aix Marseille Univ, RECOVER, Aix-en-Provence, FranceINRAE, National Research Institute on Agriculture, Food & the Environment, TETIS Unit, Montpellier, France1. Open-source biodiversity databases contain a large number of species occurrence records but are often spatially biased; which affects the reliability of species distribution models based on these records. Sample bias correction techniques require data filtering which comes at the cost of record numbers, or require considerable additional sampling effort. Since independent data is rarely available, assessment of the correction technique often relies solely on performance metrics computed using subsets of the available – biased – data, which may prove misleading.2. Here, we assess the extent to which an acknowledged sample bias correction technique is likely to improve models’ ability to predict species distributions in the absence of independent data. We assessed variation in model predictions induced by the aforementioned correction and model stochasticity; the variability between model replicates related to a random component (pseudo-absences sets and cross-validation subsets). We present, then, an index of the effect of correction relative to model stochasticity; the Relative Overlap Index (ROI). We investigated whether the ROI better represented the effect of correction than classic performance metrics (Boyce index, cAUC, AUC and TSS) and absolute overlap metrics (Schoener’s D, Pearson’s and Spearman’s correlation coefficients) when considering data related to 64 vertebrate species and 21 virtual species with a generated sample bias.3. When based on absolute overlaps and cross-validation performance metrics, we found that correction produced no significant effects. When considering its effect relative to model stochasticity, the effect of correction was strong for most species at one of the three sites. The use of virtual species enabled us to verify that the correction technique improved both distribution predictions and the biological relevance of the selected variables at the specific site, when these were not correlated with sample bias patterns.4. In the absence of additional independent data, the assessment of sample bias correction based on subsample data may be misleading. We propose to investigate both the biological relevance of environmental variables selected, and, the effect of sample bias correction based on its effect relative to model stochasticity.http://www.sciencedirect.com/science/article/pii/S1470160X22009608Accessibility mapsCross-validationPerformance metricsOverlapPseudo-absence selectionTerrestrial vertebrates
spellingShingle	Nicolas Dubos Clémentine Préau Maxime Lenormand Guillaume Papuga Sophie Monsarrat Pierre Denelle Marine Le Louarn Stien Heremans Roel May Philip Roche Sandra Luque Assessing the effect of sample bias correction in species distribution models Ecological Indicators Accessibility maps Cross-validation Performance metrics Overlap Pseudo-absence selection Terrestrial vertebrates
title	Assessing the effect of sample bias correction in species distribution models
title_full	Assessing the effect of sample bias correction in species distribution models
title_fullStr	Assessing the effect of sample bias correction in species distribution models
title_full_unstemmed	Assessing the effect of sample bias correction in species distribution models
title_short	Assessing the effect of sample bias correction in species distribution models
title_sort	assessing the effect of sample bias correction in species distribution models
topic	Accessibility maps Cross-validation Performance metrics Overlap Pseudo-absence selection Terrestrial vertebrates
url	http://www.sciencedirect.com/science/article/pii/S1470160X22009608
work_keys_str_mv	AT nicolasdubos assessingtheeffectofsamplebiascorrectioninspeciesdistributionmodels AT clementinepreau assessingtheeffectofsamplebiascorrectioninspeciesdistributionmodels AT maximelenormand assessingtheeffectofsamplebiascorrectioninspeciesdistributionmodels AT guillaumepapuga assessingtheeffectofsamplebiascorrectioninspeciesdistributionmodels AT sophiemonsarrat assessingtheeffectofsamplebiascorrectioninspeciesdistributionmodels AT pierredenelle assessingtheeffectofsamplebiascorrectioninspeciesdistributionmodels AT marinelelouarn assessingtheeffectofsamplebiascorrectioninspeciesdistributionmodels AT stienheremans assessingtheeffectofsamplebiascorrectioninspeciesdistributionmodels AT roelmay assessingtheeffectofsamplebiascorrectioninspeciesdistributionmodels AT philiproche assessingtheeffectofsamplebiascorrectioninspeciesdistributionmodels AT sandraluque assessingtheeffectofsamplebiascorrectioninspeciesdistributionmodels

Assessing the effect of sample bias correction in species distribution models

Similar Items