Gap-filling a spatially explicit plant trait database: comparing imputation methods and different levels of environmental information
The ubiquity of missing data in plant trait databases may hinder trait-based analyses of ecological patterns and processes. Spatially explicit datasets with information on intraspecific trait variability are rare but offer great promise in improving our understanding of functional biogeography. A...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Copernicus Publications
2018-05-01
|
Series: | Biogeosciences |
Online Access: | https://www.biogeosciences.net/15/2601/2018/bg-15-2601-2018.pdf |
Summary: | The ubiquity of missing data in plant trait databases may hinder trait-based
analyses of ecological patterns and processes. Spatially explicit datasets
with information on intraspecific trait variability are rare but offer great
promise in improving our understanding of functional biogeography. At the
same time, they offer specific challenges in terms of data imputation. Here
we compare statistical imputation approaches, using varying levels of
environmental information, for five plant traits (leaf biomass to sapwood
area ratio, leaf nitrogen content, maximum tree height, leaf mass per area
and wood density) in a spatially explicit plant trait dataset of temperate
and Mediterranean tree species (Ecological and Forest Inventory of Catalonia,
IEFC, dataset for Catalonia, north-east Iberian Peninsula,
31 900 km<sup>2</sup>). We simulated gaps at different missingness levels
(10–80 %) in a complete trait matrix, and we used overall trait means,
species means, <i>k</i> nearest neighbours (kNN), ordinary and regression kriging,
and multivariate imputation using chained equations (MICE) to impute missing
trait values. We assessed these methods in terms of their accuracy and of
their ability to preserve trait distributions, multi-trait correlation
structure and bivariate trait relationships. The relatively good performance
of mean and species mean imputations in terms of accuracy masked a poor
representation of trait distributions and multivariate trait structure.
Species identity improved MICE imputations for all traits, whereas forest
structure and topography improved imputations for some traits. No method
performed best consistently for the five studied traits, but, considering all
traits and performance metrics, MICE informed by relevant ecological
variables gave the best results. However, at higher missingness
(> 30 %), species mean imputations and regression kriging tended to
outperform MICE for some traits. MICE informed by relevant ecological
variables allowed us to fill the gaps in the IEFC incomplete dataset
(5495 plots) and quantify imputation uncertainty. Resulting spatial patterns
of the studied traits in Catalan forests were broadly similar when using
species means, regression kriging or the best-performing MICE application,
but some important discrepancies were observed at the local level. Our
results highlight the need to assess imputation quality beyond just
imputation accuracy and show that including environmental information in
statistical imputation approaches yields more plausible imputations in
spatially explicit plant trait datasets. |
---|---|
ISSN: | 1726-4170 1726-4189 |