Not seeing the forest for the trees: Generalised linear model out-performs random forest in species distribution modelling for Southeast Asian felids

Species Distribution Models (SDMs) are a powerful tool to derive habitat suitability predictions relating species occurrence data with habitat features. Two of the most frequently applied algorithms to model species-habitat relationships are Generalised Linear Models (GLM) and Random Forest (RF). Th...

Full description

Bibliographic Details
Main Authors:	Chiaverini, L, Macdonald, DW, Hearn, AJ, Kaszta, Ż, Ash, E, Bothwell, HM, Can, ÖE, Channa, P, Clements, GR, Haidir, IA, Kyaw, PP, Moore, JH, Rasphone, A, Tan, CKW, Cushman, SA
Format:	Journal article
Language:	English
Published:	Elsevier 2023

_version_	1826310063875162112
author	Chiaverini, L Macdonald, DW Hearn, AJ Kaszta, Ż Ash, E Bothwell, HM Can, ÖE Channa, P Clements, GR Haidir, IA Kyaw, PP Moore, JH Rasphone, A Tan, CKW Cushman, SA
author_facet	Chiaverini, L Macdonald, DW Hearn, AJ Kaszta, Ż Ash, E Bothwell, HM Can, ÖE Channa, P Clements, GR Haidir, IA Kyaw, PP Moore, JH Rasphone, A Tan, CKW Cushman, SA
author_sort	Chiaverini, L
collection	OXFORD
description	Species Distribution Models (SDMs) are a powerful tool to derive habitat suitability predictions relating species occurrence data with habitat features. Two of the most frequently applied algorithms to model species-habitat relationships are Generalised Linear Models (GLM) and Random Forest (RF). The former is a parametric regression model providing functional models with direct interpretability. The latter is a machine learning non-parametric algorithm, more tolerant than other approaches in its assumptions, which has often been shown to outperform parametric algorithms. Other approaches have been developed to produce robust SDMs, like training data bootstrapping and spatial scale optimisation. Using felid presence-absence data from three study regions in Southeast Asia (mainland, Borneo and Sumatra), we tested the performances of SDMs by implementing four modelling frameworks: GLM and RF with bootstrapped and non-bootstrapped training data. With Mantel and ANOVA tests we explored how the four combinations of algorithms and bootstrapping influenced SDMs and their predictive performances. Additionally, we tested how scale-optimisation responded to species' size, taxonomic associations (species and genus), study area and algorithm. We found that choice of algorithm had strong effect in determining the differences between SDMs' spatial predictions, while bootstrapping had no effect. Additionally, algorithm followed by study area and species, were the main factors driving differences in the spatial scales identified. SDMs trained with GLM showed higher predictive performance, however, ANOVA tests revealed that algorithm had significant effect only in explaining the variance observed in sensitivity and specificity and, when interacting with bootstrapping, in Percent Correctly Classified (PCC). Bootstrapping significantly explained the variance in specificity, PCC and True Skills Statistics (TSS). Our results suggest that there are systematic differences in the scales identified and in the predictions produced by GLM vs. RF, but that neither approach was consistently better than the other. The divergent predictions and inconsistent predictive abilities suggest that analysts should not assume machine learning is inherently superior and should test multiple methods. Our results have strong implications for SDM development, revealing the inconsistencies introduced by the choice of algorithm on scale optimisation, with GLM selecting broader scales than RF.
first_indexed	2024-03-07T07:46:33Z
format	Journal article
id	oxford-uuid:2bf0f5cf-0173-4cb5-968a-473fa7035950
institution	University of Oxford
language	English
last_indexed	2024-03-07T07:46:33Z
publishDate	2023
publisher	Elsevier
record_format	dspace
spelling	oxford-uuid:2bf0f5cf-0173-4cb5-968a-473fa70359502023-06-13T07:00:52ZNot seeing the forest for the trees: Generalised linear model out-performs random forest in species distribution modelling for Southeast Asian felidsJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:2bf0f5cf-0173-4cb5-968a-473fa7035950EnglishSymplectic ElementsElsevier2023Chiaverini, LMacdonald, DWHearn, AJKaszta, ŻAsh, EBothwell, HMCan, ÖEChanna, PClements, GRHaidir, IAKyaw, PPMoore, JHRasphone, ATan, CKWCushman, SASpecies Distribution Models (SDMs) are a powerful tool to derive habitat suitability predictions relating species occurrence data with habitat features. Two of the most frequently applied algorithms to model species-habitat relationships are Generalised Linear Models (GLM) and Random Forest (RF). The former is a parametric regression model providing functional models with direct interpretability. The latter is a machine learning non-parametric algorithm, more tolerant than other approaches in its assumptions, which has often been shown to outperform parametric algorithms. Other approaches have been developed to produce robust SDMs, like training data bootstrapping and spatial scale optimisation. Using felid presence-absence data from three study regions in Southeast Asia (mainland, Borneo and Sumatra), we tested the performances of SDMs by implementing four modelling frameworks: GLM and RF with bootstrapped and non-bootstrapped training data. With Mantel and ANOVA tests we explored how the four combinations of algorithms and bootstrapping influenced SDMs and their predictive performances. Additionally, we tested how scale-optimisation responded to species' size, taxonomic associations (species and genus), study area and algorithm. We found that choice of algorithm had strong effect in determining the differences between SDMs' spatial predictions, while bootstrapping had no effect. Additionally, algorithm followed by study area and species, were the main factors driving differences in the spatial scales identified. SDMs trained with GLM showed higher predictive performance, however, ANOVA tests revealed that algorithm had significant effect only in explaining the variance observed in sensitivity and specificity and, when interacting with bootstrapping, in Percent Correctly Classified (PCC). Bootstrapping significantly explained the variance in specificity, PCC and True Skills Statistics (TSS). Our results suggest that there are systematic differences in the scales identified and in the predictions produced by GLM vs. RF, but that neither approach was consistently better than the other. The divergent predictions and inconsistent predictive abilities suggest that analysts should not assume machine learning is inherently superior and should test multiple methods. Our results have strong implications for SDM development, revealing the inconsistencies introduced by the choice of algorithm on scale optimisation, with GLM selecting broader scales than RF.
spellingShingle	Chiaverini, L Macdonald, DW Hearn, AJ Kaszta, Ż Ash, E Bothwell, HM Can, ÖE Channa, P Clements, GR Haidir, IA Kyaw, PP Moore, JH Rasphone, A Tan, CKW Cushman, SA Not seeing the forest for the trees: Generalised linear model out-performs random forest in species distribution modelling for Southeast Asian felids
title	Not seeing the forest for the trees: Generalised linear model out-performs random forest in species distribution modelling for Southeast Asian felids
title_full	Not seeing the forest for the trees: Generalised linear model out-performs random forest in species distribution modelling for Southeast Asian felids
title_fullStr	Not seeing the forest for the trees: Generalised linear model out-performs random forest in species distribution modelling for Southeast Asian felids
title_full_unstemmed	Not seeing the forest for the trees: Generalised linear model out-performs random forest in species distribution modelling for Southeast Asian felids
title_short	Not seeing the forest for the trees: Generalised linear model out-performs random forest in species distribution modelling for Southeast Asian felids
title_sort	not seeing the forest for the trees generalised linear model out performs random forest in species distribution modelling for southeast asian felids
work_keys_str_mv	AT chiaverinil notseeingtheforestforthetreesgeneralisedlinearmodeloutperformsrandomforestinspeciesdistributionmodellingforsoutheastasianfelids AT macdonalddw notseeingtheforestforthetreesgeneralisedlinearmodeloutperformsrandomforestinspeciesdistributionmodellingforsoutheastasianfelids AT hearnaj notseeingtheforestforthetreesgeneralisedlinearmodeloutperformsrandomforestinspeciesdistributionmodellingforsoutheastasianfelids AT kasztaz notseeingtheforestforthetreesgeneralisedlinearmodeloutperformsrandomforestinspeciesdistributionmodellingforsoutheastasianfelids AT ashe notseeingtheforestforthetreesgeneralisedlinearmodeloutperformsrandomforestinspeciesdistributionmodellingforsoutheastasianfelids AT bothwellhm notseeingtheforestforthetreesgeneralisedlinearmodeloutperformsrandomforestinspeciesdistributionmodellingforsoutheastasianfelids AT canoe notseeingtheforestforthetreesgeneralisedlinearmodeloutperformsrandomforestinspeciesdistributionmodellingforsoutheastasianfelids AT channap notseeingtheforestforthetreesgeneralisedlinearmodeloutperformsrandomforestinspeciesdistributionmodellingforsoutheastasianfelids AT clementsgr notseeingtheforestforthetreesgeneralisedlinearmodeloutperformsrandomforestinspeciesdistributionmodellingforsoutheastasianfelids AT haidiria notseeingtheforestforthetreesgeneralisedlinearmodeloutperformsrandomforestinspeciesdistributionmodellingforsoutheastasianfelids AT kyawpp notseeingtheforestforthetreesgeneralisedlinearmodeloutperformsrandomforestinspeciesdistributionmodellingforsoutheastasianfelids AT moorejh notseeingtheforestforthetreesgeneralisedlinearmodeloutperformsrandomforestinspeciesdistributionmodellingforsoutheastasianfelids AT rasphonea notseeingtheforestforthetreesgeneralisedlinearmodeloutperformsrandomforestinspeciesdistributionmodellingforsoutheastasianfelids AT tanckw notseeingtheforestforthetreesgeneralisedlinearmodeloutperformsrandomforestinspeciesdistributionmodellingforsoutheastasianfelids AT cushmansa notseeingtheforestforthetreesgeneralisedlinearmodeloutperformsrandomforestinspeciesdistributionmodellingforsoutheastasianfelids

Not seeing the forest for the trees: Generalised linear model out-performs random forest in species distribution modelling for Southeast Asian felids

Similar Items