Gap-filling a spatially explicit plant trait database: comparing imputation methods and different levels of environmental information
The ubiquity of missing data in plant trait databases may hinder trait-based analyses of ecological patterns and processes. Spatially explicit datasets with information on intraspecific trait variability are rare but offer great promise in improving our understanding of functional biogeography. A...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Copernicus Publications
2018-05-01
|
Series: | Biogeosciences |
Online Access: | https://www.biogeosciences.net/15/2601/2018/bg-15-2601-2018.pdf |
_version_ | 1811220766057824256 |
---|---|
author | R. Poyatos R. Poyatos O. Sus L. Badiella M. Mencuccini M. Mencuccini J. Martínez-Vilalta J. Martínez-Vilalta |
author_facet | R. Poyatos R. Poyatos O. Sus L. Badiella M. Mencuccini M. Mencuccini J. Martínez-Vilalta J. Martínez-Vilalta |
author_sort | R. Poyatos |
collection | DOAJ |
description | The ubiquity of missing data in plant trait databases may hinder trait-based
analyses of ecological patterns and processes. Spatially explicit datasets
with information on intraspecific trait variability are rare but offer great
promise in improving our understanding of functional biogeography. At the
same time, they offer specific challenges in terms of data imputation. Here
we compare statistical imputation approaches, using varying levels of
environmental information, for five plant traits (leaf biomass to sapwood
area ratio, leaf nitrogen content, maximum tree height, leaf mass per area
and wood density) in a spatially explicit plant trait dataset of temperate
and Mediterranean tree species (Ecological and Forest Inventory of Catalonia,
IEFC, dataset for Catalonia, north-east Iberian Peninsula,
31 900 km<sup>2</sup>). We simulated gaps at different missingness levels
(10–80 %) in a complete trait matrix, and we used overall trait means,
species means, <i>k</i> nearest neighbours (kNN), ordinary and regression kriging,
and multivariate imputation using chained equations (MICE) to impute missing
trait values. We assessed these methods in terms of their accuracy and of
their ability to preserve trait distributions, multi-trait correlation
structure and bivariate trait relationships. The relatively good performance
of mean and species mean imputations in terms of accuracy masked a poor
representation of trait distributions and multivariate trait structure.
Species identity improved MICE imputations for all traits, whereas forest
structure and topography improved imputations for some traits. No method
performed best consistently for the five studied traits, but, considering all
traits and performance metrics, MICE informed by relevant ecological
variables gave the best results. However, at higher missingness
(> 30 %), species mean imputations and regression kriging tended to
outperform MICE for some traits. MICE informed by relevant ecological
variables allowed us to fill the gaps in the IEFC incomplete dataset
(5495 plots) and quantify imputation uncertainty. Resulting spatial patterns
of the studied traits in Catalan forests were broadly similar when using
species means, regression kriging or the best-performing MICE application,
but some important discrepancies were observed at the local level. Our
results highlight the need to assess imputation quality beyond just
imputation accuracy and show that including environmental information in
statistical imputation approaches yields more plausible imputations in
spatially explicit plant trait datasets. |
first_indexed | 2024-04-12T07:47:05Z |
format | Article |
id | doaj.art-c1897a8d3a3a4d00b2cf88d1064685d2 |
institution | Directory Open Access Journal |
issn | 1726-4170 1726-4189 |
language | English |
last_indexed | 2024-04-12T07:47:05Z |
publishDate | 2018-05-01 |
publisher | Copernicus Publications |
record_format | Article |
series | Biogeosciences |
spelling | doaj.art-c1897a8d3a3a4d00b2cf88d1064685d22022-12-22T03:41:41ZengCopernicus PublicationsBiogeosciences1726-41701726-41892018-05-01152601261710.5194/bg-15-2601-2018Gap-filling a spatially explicit plant trait database: comparing imputation methods and different levels of environmental informationR. Poyatos0R. Poyatos1O. Sus2L. Badiella3M. Mencuccini4M. Mencuccini5J. Martínez-Vilalta6J. Martínez-Vilalta7CREAF, E08193 Bellaterra (Cerdanyola del Vallès), Catalonia, SpainLaboratory of Plant Ecology, Faculty of Bioscience Engineering, Ghent University, Coupure links 653, 9000 Gent, BelgiumEUMETSAT, Eumetsat Allee 1, 64295 Darmstad, GermanyServei d'Estadística Aplicada, Universitat Autònoma de Barcelona, Cerdanyola del Vallès 08193, Barcelona, SpainCREAF, E08193 Bellaterra (Cerdanyola del Vallès), Catalonia, SpainICREA, Barcelona, SpainCREAF, E08193 Bellaterra (Cerdanyola del Vallès), Catalonia, SpainUniversitat Autònoma de Barcelona, E08193 Bellaterra (Cerdanyola del Vallès), Catalonia, SpainThe ubiquity of missing data in plant trait databases may hinder trait-based analyses of ecological patterns and processes. Spatially explicit datasets with information on intraspecific trait variability are rare but offer great promise in improving our understanding of functional biogeography. At the same time, they offer specific challenges in terms of data imputation. Here we compare statistical imputation approaches, using varying levels of environmental information, for five plant traits (leaf biomass to sapwood area ratio, leaf nitrogen content, maximum tree height, leaf mass per area and wood density) in a spatially explicit plant trait dataset of temperate and Mediterranean tree species (Ecological and Forest Inventory of Catalonia, IEFC, dataset for Catalonia, north-east Iberian Peninsula, 31 900 km<sup>2</sup>). We simulated gaps at different missingness levels (10–80 %) in a complete trait matrix, and we used overall trait means, species means, <i>k</i> nearest neighbours (kNN), ordinary and regression kriging, and multivariate imputation using chained equations (MICE) to impute missing trait values. We assessed these methods in terms of their accuracy and of their ability to preserve trait distributions, multi-trait correlation structure and bivariate trait relationships. The relatively good performance of mean and species mean imputations in terms of accuracy masked a poor representation of trait distributions and multivariate trait structure. Species identity improved MICE imputations for all traits, whereas forest structure and topography improved imputations for some traits. No method performed best consistently for the five studied traits, but, considering all traits and performance metrics, MICE informed by relevant ecological variables gave the best results. However, at higher missingness (> 30 %), species mean imputations and regression kriging tended to outperform MICE for some traits. MICE informed by relevant ecological variables allowed us to fill the gaps in the IEFC incomplete dataset (5495 plots) and quantify imputation uncertainty. Resulting spatial patterns of the studied traits in Catalan forests were broadly similar when using species means, regression kriging or the best-performing MICE application, but some important discrepancies were observed at the local level. Our results highlight the need to assess imputation quality beyond just imputation accuracy and show that including environmental information in statistical imputation approaches yields more plausible imputations in spatially explicit plant trait datasets.https://www.biogeosciences.net/15/2601/2018/bg-15-2601-2018.pdf |
spellingShingle | R. Poyatos R. Poyatos O. Sus L. Badiella M. Mencuccini M. Mencuccini J. Martínez-Vilalta J. Martínez-Vilalta Gap-filling a spatially explicit plant trait database: comparing imputation methods and different levels of environmental information Biogeosciences |
title | Gap-filling a spatially explicit plant trait database: comparing imputation methods and different levels of environmental information |
title_full | Gap-filling a spatially explicit plant trait database: comparing imputation methods and different levels of environmental information |
title_fullStr | Gap-filling a spatially explicit plant trait database: comparing imputation methods and different levels of environmental information |
title_full_unstemmed | Gap-filling a spatially explicit plant trait database: comparing imputation methods and different levels of environmental information |
title_short | Gap-filling a spatially explicit plant trait database: comparing imputation methods and different levels of environmental information |
title_sort | gap filling a spatially explicit plant trait database comparing imputation methods and different levels of environmental information |
url | https://www.biogeosciences.net/15/2601/2018/bg-15-2601-2018.pdf |
work_keys_str_mv | AT rpoyatos gapfillingaspatiallyexplicitplanttraitdatabasecomparingimputationmethodsanddifferentlevelsofenvironmentalinformation AT rpoyatos gapfillingaspatiallyexplicitplanttraitdatabasecomparingimputationmethodsanddifferentlevelsofenvironmentalinformation AT osus gapfillingaspatiallyexplicitplanttraitdatabasecomparingimputationmethodsanddifferentlevelsofenvironmentalinformation AT lbadiella gapfillingaspatiallyexplicitplanttraitdatabasecomparingimputationmethodsanddifferentlevelsofenvironmentalinformation AT mmencuccini gapfillingaspatiallyexplicitplanttraitdatabasecomparingimputationmethodsanddifferentlevelsofenvironmentalinformation AT mmencuccini gapfillingaspatiallyexplicitplanttraitdatabasecomparingimputationmethodsanddifferentlevelsofenvironmentalinformation AT jmartinezvilalta gapfillingaspatiallyexplicitplanttraitdatabasecomparingimputationmethodsanddifferentlevelsofenvironmentalinformation AT jmartinezvilalta gapfillingaspatiallyexplicitplanttraitdatabasecomparingimputationmethodsanddifferentlevelsofenvironmentalinformation |