Gap-filling a spatially explicit plant trait database: comparing imputation methods and different levels of environmental information

The ubiquity of missing data in plant trait databases may hinder trait-based analyses of ecological patterns and processes. Spatially explicit datasets with information on intraspecific trait variability are rare but offer great promise in improving our understanding of functional biogeography. A...

Full description

Bibliographic Details
Main Authors: R. Poyatos, O. Sus, L. Badiella, M. Mencuccini, J. Martínez-Vilalta
Format: Article
Language:English
Published: Copernicus Publications 2018-05-01
Series:Biogeosciences
Online Access:https://www.biogeosciences.net/15/2601/2018/bg-15-2601-2018.pdf
_version_ 1811220766057824256
author R. Poyatos
R. Poyatos
O. Sus
L. Badiella
M. Mencuccini
M. Mencuccini
J. Martínez-Vilalta
J. Martínez-Vilalta
author_facet R. Poyatos
R. Poyatos
O. Sus
L. Badiella
M. Mencuccini
M. Mencuccini
J. Martínez-Vilalta
J. Martínez-Vilalta
author_sort R. Poyatos
collection DOAJ
description The ubiquity of missing data in plant trait databases may hinder trait-based analyses of ecological patterns and processes. Spatially explicit datasets with information on intraspecific trait variability are rare but offer great promise in improving our understanding of functional biogeography. At the same time, they offer specific challenges in terms of data imputation. Here we compare statistical imputation approaches, using varying levels of environmental information, for five plant traits (leaf biomass to sapwood area ratio, leaf nitrogen content, maximum tree height, leaf mass per area and wood density) in a spatially explicit plant trait dataset of temperate and Mediterranean tree species (Ecological and Forest Inventory of Catalonia, IEFC, dataset for Catalonia, north-east Iberian Peninsula, 31 900 km<sup>2</sup>). We simulated gaps at different missingness levels (10–80 %) in a complete trait matrix, and we used overall trait means, species means, <i>k</i> nearest neighbours (kNN), ordinary and regression kriging, and multivariate imputation using chained equations (MICE) to impute missing trait values. We assessed these methods in terms of their accuracy and of their ability to preserve trait distributions, multi-trait correlation structure and bivariate trait relationships. The relatively good performance of mean and species mean imputations in terms of accuracy masked a poor representation of trait distributions and multivariate trait structure. Species identity improved MICE imputations for all traits, whereas forest structure and topography improved imputations for some traits. No method performed best consistently for the five studied traits, but, considering all traits and performance metrics, MICE informed by relevant ecological variables gave the best results. However, at higher missingness (&gt; 30 %), species mean imputations and regression kriging tended to outperform MICE for some traits. MICE informed by relevant ecological variables allowed us to fill the gaps in the IEFC incomplete dataset (5495 plots) and quantify imputation uncertainty. Resulting spatial patterns of the studied traits in Catalan forests were broadly similar when using species means, regression kriging or the best-performing MICE application, but some important discrepancies were observed at the local level. Our results highlight the need to assess imputation quality beyond just imputation accuracy and show that including environmental information in statistical imputation approaches yields more plausible imputations in spatially explicit plant trait datasets.
first_indexed 2024-04-12T07:47:05Z
format Article
id doaj.art-c1897a8d3a3a4d00b2cf88d1064685d2
institution Directory Open Access Journal
issn 1726-4170
1726-4189
language English
last_indexed 2024-04-12T07:47:05Z
publishDate 2018-05-01
publisher Copernicus Publications
record_format Article
series Biogeosciences
spelling doaj.art-c1897a8d3a3a4d00b2cf88d1064685d22022-12-22T03:41:41ZengCopernicus PublicationsBiogeosciences1726-41701726-41892018-05-01152601261710.5194/bg-15-2601-2018Gap-filling a spatially explicit plant trait database: comparing imputation methods and different levels of environmental informationR. Poyatos0R. Poyatos1O. Sus2L. Badiella3M. Mencuccini4M. Mencuccini5J. Martínez-Vilalta6J. Martínez-Vilalta7CREAF, E08193 Bellaterra (Cerdanyola del Vallès), Catalonia, SpainLaboratory of Plant Ecology, Faculty of Bioscience Engineering, Ghent University, Coupure links 653, 9000 Gent, BelgiumEUMETSAT, Eumetsat Allee 1, 64295 Darmstad, GermanyServei d'Estadística Aplicada, Universitat Autònoma de Barcelona, Cerdanyola del Vallès 08193, Barcelona, SpainCREAF, E08193 Bellaterra (Cerdanyola del Vallès), Catalonia, SpainICREA, Barcelona, SpainCREAF, E08193 Bellaterra (Cerdanyola del Vallès), Catalonia, SpainUniversitat Autònoma de Barcelona, E08193 Bellaterra (Cerdanyola del Vallès), Catalonia, SpainThe ubiquity of missing data in plant trait databases may hinder trait-based analyses of ecological patterns and processes. Spatially explicit datasets with information on intraspecific trait variability are rare but offer great promise in improving our understanding of functional biogeography. At the same time, they offer specific challenges in terms of data imputation. Here we compare statistical imputation approaches, using varying levels of environmental information, for five plant traits (leaf biomass to sapwood area ratio, leaf nitrogen content, maximum tree height, leaf mass per area and wood density) in a spatially explicit plant trait dataset of temperate and Mediterranean tree species (Ecological and Forest Inventory of Catalonia, IEFC, dataset for Catalonia, north-east Iberian Peninsula, 31 900 km<sup>2</sup>). We simulated gaps at different missingness levels (10–80 %) in a complete trait matrix, and we used overall trait means, species means, <i>k</i> nearest neighbours (kNN), ordinary and regression kriging, and multivariate imputation using chained equations (MICE) to impute missing trait values. We assessed these methods in terms of their accuracy and of their ability to preserve trait distributions, multi-trait correlation structure and bivariate trait relationships. The relatively good performance of mean and species mean imputations in terms of accuracy masked a poor representation of trait distributions and multivariate trait structure. Species identity improved MICE imputations for all traits, whereas forest structure and topography improved imputations for some traits. No method performed best consistently for the five studied traits, but, considering all traits and performance metrics, MICE informed by relevant ecological variables gave the best results. However, at higher missingness (&gt; 30 %), species mean imputations and regression kriging tended to outperform MICE for some traits. MICE informed by relevant ecological variables allowed us to fill the gaps in the IEFC incomplete dataset (5495 plots) and quantify imputation uncertainty. Resulting spatial patterns of the studied traits in Catalan forests were broadly similar when using species means, regression kriging or the best-performing MICE application, but some important discrepancies were observed at the local level. Our results highlight the need to assess imputation quality beyond just imputation accuracy and show that including environmental information in statistical imputation approaches yields more plausible imputations in spatially explicit plant trait datasets.https://www.biogeosciences.net/15/2601/2018/bg-15-2601-2018.pdf
spellingShingle R. Poyatos
R. Poyatos
O. Sus
L. Badiella
M. Mencuccini
M. Mencuccini
J. Martínez-Vilalta
J. Martínez-Vilalta
Gap-filling a spatially explicit plant trait database: comparing imputation methods and different levels of environmental information
Biogeosciences
title Gap-filling a spatially explicit plant trait database: comparing imputation methods and different levels of environmental information
title_full Gap-filling a spatially explicit plant trait database: comparing imputation methods and different levels of environmental information
title_fullStr Gap-filling a spatially explicit plant trait database: comparing imputation methods and different levels of environmental information
title_full_unstemmed Gap-filling a spatially explicit plant trait database: comparing imputation methods and different levels of environmental information
title_short Gap-filling a spatially explicit plant trait database: comparing imputation methods and different levels of environmental information
title_sort gap filling a spatially explicit plant trait database comparing imputation methods and different levels of environmental information
url https://www.biogeosciences.net/15/2601/2018/bg-15-2601-2018.pdf
work_keys_str_mv AT rpoyatos gapfillingaspatiallyexplicitplanttraitdatabasecomparingimputationmethodsanddifferentlevelsofenvironmentalinformation
AT rpoyatos gapfillingaspatiallyexplicitplanttraitdatabasecomparingimputationmethodsanddifferentlevelsofenvironmentalinformation
AT osus gapfillingaspatiallyexplicitplanttraitdatabasecomparingimputationmethodsanddifferentlevelsofenvironmentalinformation
AT lbadiella gapfillingaspatiallyexplicitplanttraitdatabasecomparingimputationmethodsanddifferentlevelsofenvironmentalinformation
AT mmencuccini gapfillingaspatiallyexplicitplanttraitdatabasecomparingimputationmethodsanddifferentlevelsofenvironmentalinformation
AT mmencuccini gapfillingaspatiallyexplicitplanttraitdatabasecomparingimputationmethodsanddifferentlevelsofenvironmentalinformation
AT jmartinezvilalta gapfillingaspatiallyexplicitplanttraitdatabasecomparingimputationmethodsanddifferentlevelsofenvironmentalinformation
AT jmartinezvilalta gapfillingaspatiallyexplicitplanttraitdatabasecomparingimputationmethodsanddifferentlevelsofenvironmentalinformation