Generalized Linear Models outperform commonly used canonical analysis in estimating spatial structure of presence/absence data

Background Ecological communities tend to be spatially structured due to environmental gradients and/or spatially contagious processes such as growth, dispersion and species interactions. Data transformation followed by usage of algorithms such as Redundancy Analysis (RDA) is a fairly common approac...

Full description

Bibliographic Details
Main Authors:	Lélis A. Carlos-Júnior, Joel C. Creed, Rob Marrs, Rob J. Lewis, Timothy P. Moulton, Rafael Feijó-Lima, Matthew Spencer
Format:	Article
Language:	English
Published:	PeerJ Inc. 2020-09-01
Series:	PeerJ
Subjects:	Redundancy Analysis (RDA) Statistical modelling Spatial analysis Spatial ecology Beta diversity Moran’s Eigenvector Maps (MEMs)
Online Access:	https://peerj.com/articles/9777.pdf

_version_	1797421850574192640
author	Lélis A. Carlos-Júnior Joel C. Creed Rob Marrs Rob J. Lewis Timothy P. Moulton Rafael Feijó-Lima Matthew Spencer
author_facet	Lélis A. Carlos-Júnior Joel C. Creed Rob Marrs Rob J. Lewis Timothy P. Moulton Rafael Feijó-Lima Matthew Spencer
author_sort	Lélis A. Carlos-Júnior
collection	DOAJ
description	Background Ecological communities tend to be spatially structured due to environmental gradients and/or spatially contagious processes such as growth, dispersion and species interactions. Data transformation followed by usage of algorithms such as Redundancy Analysis (RDA) is a fairly common approach in studies searching for spatial structure in ecological communities, despite recent suggestions advocating the use of Generalized Linear Models (GLMs). Here, we compared the performance of GLMs and RDA in describing spatial structure in ecological community composition data. We simulated realistic presence/absence data typical of many β-diversity studies. For model selection we used standard methods commonly used in most studies involving RDA and GLMs. Methods We simulated communities with known spatial structure, based on three real spatial community presence/absence datasets (one terrestrial, one marine and one freshwater). We used spatial eigenvectors as explanatory variables. We varied the number of non-zero coefficients of the spatial variables, and the spatial scales with which these coefficients were associated and then compared the performance of GLMs and RDA frameworks to correctly retrieve the spatial patterns contained in the simulated communities. We used two different methods for model selection, Forward Selection (FW) for RDA and the Akaike Information Criterion (AIC) for GLMs. The performance of each method was assessed by scoring overall accuracy as the proportion of variables whose inclusion/exclusion status was correct, and by distinguishing which kind of error was observed for each method. We also assessed whether errors in variable selection could affect the interpretation of spatial structure. Results Overall GLM with AIC-based model selection (GLM/AIC) performed better than RDA/FW in selecting spatial explanatory variables, although under some simulations the methods performed similarly. In general, RDA/FW performed unpredictably, often retaining too many explanatory variables and selecting variables associated with incorrect spatial scales. The spatial scale of the pattern had a negligible effect on GLM/AIC performance but consistently affected RDA’s error rates under almost all scenarios. Conclusion We encourage the use of GLM/AIC for studies searching for spatial drivers of species presence/absence patterns, since this framework outperformed RDA/FW in situations most likely to be found in natural communities. It is likely that such recommendations might extend to other types of explanatory variables.
first_indexed	2024-03-09T07:23:25Z
format	Article
id	doaj.art-e79a82d2907f4c658dfc50e59aed8ec3
institution	Directory Open Access Journal
issn	2167-8359
language	English
last_indexed	2024-03-09T07:23:25Z
publishDate	2020-09-01
publisher	PeerJ Inc.
record_format	Article
series	PeerJ
spelling	doaj.art-e79a82d2907f4c658dfc50e59aed8ec32023-12-03T07:14:31ZengPeerJ Inc.PeerJ2167-83592020-09-018e977710.7717/peerj.9777Generalized Linear Models outperform commonly used canonical analysis in estimating spatial structure of presence/absence dataLélis A. Carlos-Júnior0Joel C. Creed1Rob Marrs2Rob J. Lewis3Timothy P. Moulton4Rafael Feijó-Lima5Matthew Spencer6Programa de Pós-Graduação em Ecologia e Evolução, Universidade do Estado do Rio do Janeiro, Rio de Janeiro, BrazilDepartamento de Ecologia, Universidade do Estado do Rio de Janeiro, Rio de Janeiro, BrazilSchool of Environmental Sciences, University of Liverpool, Liverpool, United KingdomDepartment of Forest Genetics and Biodiversity, Norwegian Institute of Bioeconomy Research, Bergen, NorwayDepartamento de Ecologia, Universidade do Estado do Rio de Janeiro, Rio de Janeiro, BrazilPrograma de Pós-Graduação em Ecologia e Evolução, Universidade do Estado do Rio do Janeiro, Rio de Janeiro, BrazilSchool of Environmental Sciences, University of Liverpool, Liverpool, United KingdomBackground Ecological communities tend to be spatially structured due to environmental gradients and/or spatially contagious processes such as growth, dispersion and species interactions. Data transformation followed by usage of algorithms such as Redundancy Analysis (RDA) is a fairly common approach in studies searching for spatial structure in ecological communities, despite recent suggestions advocating the use of Generalized Linear Models (GLMs). Here, we compared the performance of GLMs and RDA in describing spatial structure in ecological community composition data. We simulated realistic presence/absence data typical of many β-diversity studies. For model selection we used standard methods commonly used in most studies involving RDA and GLMs. Methods We simulated communities with known spatial structure, based on three real spatial community presence/absence datasets (one terrestrial, one marine and one freshwater). We used spatial eigenvectors as explanatory variables. We varied the number of non-zero coefficients of the spatial variables, and the spatial scales with which these coefficients were associated and then compared the performance of GLMs and RDA frameworks to correctly retrieve the spatial patterns contained in the simulated communities. We used two different methods for model selection, Forward Selection (FW) for RDA and the Akaike Information Criterion (AIC) for GLMs. The performance of each method was assessed by scoring overall accuracy as the proportion of variables whose inclusion/exclusion status was correct, and by distinguishing which kind of error was observed for each method. We also assessed whether errors in variable selection could affect the interpretation of spatial structure. Results Overall GLM with AIC-based model selection (GLM/AIC) performed better than RDA/FW in selecting spatial explanatory variables, although under some simulations the methods performed similarly. In general, RDA/FW performed unpredictably, often retaining too many explanatory variables and selecting variables associated with incorrect spatial scales. The spatial scale of the pattern had a negligible effect on GLM/AIC performance but consistently affected RDA’s error rates under almost all scenarios. Conclusion We encourage the use of GLM/AIC for studies searching for spatial drivers of species presence/absence patterns, since this framework outperformed RDA/FW in situations most likely to be found in natural communities. It is likely that such recommendations might extend to other types of explanatory variables.https://peerj.com/articles/9777.pdfRedundancy Analysis (RDA)Statistical modellingSpatial analysisSpatial ecologyBeta diversityMoran’s Eigenvector Maps (MEMs)
spellingShingle	Lélis A. Carlos-Júnior Joel C. Creed Rob Marrs Rob J. Lewis Timothy P. Moulton Rafael Feijó-Lima Matthew Spencer Generalized Linear Models outperform commonly used canonical analysis in estimating spatial structure of presence/absence data PeerJ Redundancy Analysis (RDA) Statistical modelling Spatial analysis Spatial ecology Beta diversity Moran’s Eigenvector Maps (MEMs)
title	Generalized Linear Models outperform commonly used canonical analysis in estimating spatial structure of presence/absence data
title_full	Generalized Linear Models outperform commonly used canonical analysis in estimating spatial structure of presence/absence data
title_fullStr	Generalized Linear Models outperform commonly used canonical analysis in estimating spatial structure of presence/absence data
title_full_unstemmed	Generalized Linear Models outperform commonly used canonical analysis in estimating spatial structure of presence/absence data
title_short	Generalized Linear Models outperform commonly used canonical analysis in estimating spatial structure of presence/absence data
title_sort	generalized linear models outperform commonly used canonical analysis in estimating spatial structure of presence absence data
topic	Redundancy Analysis (RDA) Statistical modelling Spatial analysis Spatial ecology Beta diversity Moran’s Eigenvector Maps (MEMs)
url	https://peerj.com/articles/9777.pdf
work_keys_str_mv	AT lelisacarlosjunior generalizedlinearmodelsoutperformcommonlyusedcanonicalanalysisinestimatingspatialstructureofpresenceabsencedata AT joelccreed generalizedlinearmodelsoutperformcommonlyusedcanonicalanalysisinestimatingspatialstructureofpresenceabsencedata AT robmarrs generalizedlinearmodelsoutperformcommonlyusedcanonicalanalysisinestimatingspatialstructureofpresenceabsencedata AT robjlewis generalizedlinearmodelsoutperformcommonlyusedcanonicalanalysisinestimatingspatialstructureofpresenceabsencedata AT timothypmoulton generalizedlinearmodelsoutperformcommonlyusedcanonicalanalysisinestimatingspatialstructureofpresenceabsencedata AT rafaelfeijolima generalizedlinearmodelsoutperformcommonlyusedcanonicalanalysisinestimatingspatialstructureofpresenceabsencedata AT matthewspencer generalizedlinearmodelsoutperformcommonlyusedcanonicalanalysisinestimatingspatialstructureofpresenceabsencedata

Generalized Linear Models outperform commonly used canonical analysis in estimating spatial structure of presence/absence data

Similar Items