Missing data imputation of high‐resolution temporal climate time series data

Abstract Analysis of high‐resolution data offers greater opportunity to understand the nature of data variability, behaviours, trends and to detect small changes. Climate studies often require complete time series data which, in the presence of missing data, means imputation must be undertaken. Rese...

Full description

Bibliographic Details
Main Authors: E Afrifa‐Yamoah, U. A. Mueller, S. M. Taylor, A. J. Fisher
Format: Article
Language:English
Published: Wiley 2020-01-01
Series:Meteorological Applications
Subjects:
Online Access:https://doi.org/10.1002/met.1873
_version_ 1797900563352911872
author E Afrifa‐Yamoah
U. A. Mueller
S. M. Taylor
A. J. Fisher
author_facet E Afrifa‐Yamoah
U. A. Mueller
S. M. Taylor
A. J. Fisher
author_sort E Afrifa‐Yamoah
collection DOAJ
description Abstract Analysis of high‐resolution data offers greater opportunity to understand the nature of data variability, behaviours, trends and to detect small changes. Climate studies often require complete time series data which, in the presence of missing data, means imputation must be undertaken. Research on the imputation of high‐resolution temporal climate time series data is still at an early phase. In this study, multiple approaches to the imputation of missing values were evaluated, including a structural time series model with Kalman smoothing, an autoregressive integrated moving average (ARIMA) model with Kalman smoothing and multiple linear regression. The methods were applied to complete subsets of data from 12 month time series of hourly temperature, humidity and wind speed data from four locations along the coast of Western Australia. Assuming that observations were missing at random, artificial gaps of missing observations were studied using a five‐fold cross‐validation methodology with the proportion of missing data set to 10%. The techniques were compared using the pooled mean absolute error, root mean square error and symmetric mean absolute percentage error. The multiple linear regression model was generally the best model based on the pooled performance indicators, followed by the ARIMA with Kalman smoothing. However, the low error values obtained from each of the approaches suggested that the models competed closely and imputed highly plausible values. To some extent, the performance of the models varied among locations. It can be concluded that the modelling approaches studied have demonstrated suitability in imputing missing data in hourly temperature, humidity and wind speed data and are therefore recommended for application in other fields where high‐resolution data with missing values are common.
first_indexed 2024-04-10T08:46:50Z
format Article
id doaj.art-6357c90b3cc74de3817383d660e90fe9
institution Directory Open Access Journal
issn 1350-4827
1469-8080
language English
last_indexed 2024-04-10T08:46:50Z
publishDate 2020-01-01
publisher Wiley
record_format Article
series Meteorological Applications
spelling doaj.art-6357c90b3cc74de3817383d660e90fe92023-02-22T07:11:32ZengWileyMeteorological Applications1350-48271469-80802020-01-01271n/an/a10.1002/met.1873Missing data imputation of high‐resolution temporal climate time series dataE Afrifa‐Yamoah0U. A. Mueller1S. M. Taylor2A. J. Fisher3School of Science Edith Cowan University Joondalup AustraliaSchool of Science Edith Cowan University Joondalup AustraliaDepartment of Primary Industries and Regional Development (DPIRD) Western Australian Fisheries and Marine Research Laboratories North Beach AustraliaSchool of Science Edith Cowan University Joondalup AustraliaAbstract Analysis of high‐resolution data offers greater opportunity to understand the nature of data variability, behaviours, trends and to detect small changes. Climate studies often require complete time series data which, in the presence of missing data, means imputation must be undertaken. Research on the imputation of high‐resolution temporal climate time series data is still at an early phase. In this study, multiple approaches to the imputation of missing values were evaluated, including a structural time series model with Kalman smoothing, an autoregressive integrated moving average (ARIMA) model with Kalman smoothing and multiple linear regression. The methods were applied to complete subsets of data from 12 month time series of hourly temperature, humidity and wind speed data from four locations along the coast of Western Australia. Assuming that observations were missing at random, artificial gaps of missing observations were studied using a five‐fold cross‐validation methodology with the proportion of missing data set to 10%. The techniques were compared using the pooled mean absolute error, root mean square error and symmetric mean absolute percentage error. The multiple linear regression model was generally the best model based on the pooled performance indicators, followed by the ARIMA with Kalman smoothing. However, the low error values obtained from each of the approaches suggested that the models competed closely and imputed highly plausible values. To some extent, the performance of the models varied among locations. It can be concluded that the modelling approaches studied have demonstrated suitability in imputing missing data in hourly temperature, humidity and wind speed data and are therefore recommended for application in other fields where high‐resolution data with missing values are common.https://doi.org/10.1002/met.1873high‐resolution climate time series dataimputationmissing observationsshort cycle durationstate‐space modelling
spellingShingle E Afrifa‐Yamoah
U. A. Mueller
S. M. Taylor
A. J. Fisher
Missing data imputation of high‐resolution temporal climate time series data
Meteorological Applications
high‐resolution climate time series data
imputation
missing observations
short cycle duration
state‐space modelling
title Missing data imputation of high‐resolution temporal climate time series data
title_full Missing data imputation of high‐resolution temporal climate time series data
title_fullStr Missing data imputation of high‐resolution temporal climate time series data
title_full_unstemmed Missing data imputation of high‐resolution temporal climate time series data
title_short Missing data imputation of high‐resolution temporal climate time series data
title_sort missing data imputation of high resolution temporal climate time series data
topic high‐resolution climate time series data
imputation
missing observations
short cycle duration
state‐space modelling
url https://doi.org/10.1002/met.1873
work_keys_str_mv AT eafrifayamoah missingdataimputationofhighresolutiontemporalclimatetimeseriesdata
AT uamueller missingdataimputationofhighresolutiontemporalclimatetimeseriesdata
AT smtaylor missingdataimputationofhighresolutiontemporalclimatetimeseriesdata
AT ajfisher missingdataimputationofhighresolutiontemporalclimatetimeseriesdata