Assessing Machine Learning Models for Gap Filling Daily Rainfall Series in a Semiarid Region of Spain

The presence of missing data in hydrometeorological datasets is a common problem, usually due to sensor malfunction, deficiencies in records storage and transmission, or other recovery procedures issues. These missing values are the primary source of problems when analyzing and modeling their spatia...

Full description

Bibliographic Details
Main Authors: Juan Antonio Bellido-Jiménez, Javier Estévez Gualda, Amanda Penélope García-Marín
Format: Article
Language:English
Published: MDPI AG 2021-09-01
Series:Atmosphere
Subjects:
Online Access:https://www.mdpi.com/2073-4433/12/9/1158
_version_ 1797520236646236160
author Juan Antonio Bellido-Jiménez
Javier Estévez Gualda
Amanda Penélope García-Marín
author_facet Juan Antonio Bellido-Jiménez
Javier Estévez Gualda
Amanda Penélope García-Marín
author_sort Juan Antonio Bellido-Jiménez
collection DOAJ
description The presence of missing data in hydrometeorological datasets is a common problem, usually due to sensor malfunction, deficiencies in records storage and transmission, or other recovery procedures issues. These missing values are the primary source of problems when analyzing and modeling their spatial and temporal variability. Thus, accurate gap-filling techniques for rainfall time series are necessary to have complete datasets, which is crucial in studying climate change evolution. In this work, several machine learning models have been assessed to gap-fill rainfall data, using different approaches and locations in the semiarid region of Andalusia (Southern Spain). Based on the obtained results, the use of neighbor data, located within a 50 km radius, highly outperformed the rest of the assessed approaches, with RMSE (root mean squared error) values up to 1.246 mm/day, MBE (mean bias error) values up to −0.001 mm/day, and R<sup>2</sup> values up to 0.898. Besides, inland area results outperformed coastal area in most locations, arising the efficiency effects based on the distance to the sea (up to an improvement of 63.89% in terms of RMSE). Finally, machine learning (ML) models (especially MLP (multilayer perceptron)) notably outperformed simple linear regression estimations in the coastal sites, whereas in inland locations, the improvements were not such significant.
first_indexed 2024-03-10T07:53:56Z
format Article
id doaj.art-24025d4ff7a44b36b4abf67bddf9e1c4
institution Directory Open Access Journal
issn 2073-4433
language English
last_indexed 2024-03-10T07:53:56Z
publishDate 2021-09-01
publisher MDPI AG
record_format Article
series Atmosphere
spelling doaj.art-24025d4ff7a44b36b4abf67bddf9e1c42023-11-22T12:00:07ZengMDPI AGAtmosphere2073-44332021-09-01129115810.3390/atmos12091158Assessing Machine Learning Models for Gap Filling Daily Rainfall Series in a Semiarid Region of SpainJuan Antonio Bellido-Jiménez0Javier Estévez Gualda1Amanda Penélope García-Marín2Projects Engineering Area, Department of Rural Engineering, University of Córdoba, 14071 Córdoba, SpainProjects Engineering Area, Department of Rural Engineering, University of Córdoba, 14071 Córdoba, SpainProjects Engineering Area, Department of Rural Engineering, University of Córdoba, 14071 Córdoba, SpainThe presence of missing data in hydrometeorological datasets is a common problem, usually due to sensor malfunction, deficiencies in records storage and transmission, or other recovery procedures issues. These missing values are the primary source of problems when analyzing and modeling their spatial and temporal variability. Thus, accurate gap-filling techniques for rainfall time series are necessary to have complete datasets, which is crucial in studying climate change evolution. In this work, several machine learning models have been assessed to gap-fill rainfall data, using different approaches and locations in the semiarid region of Andalusia (Southern Spain). Based on the obtained results, the use of neighbor data, located within a 50 km radius, highly outperformed the rest of the assessed approaches, with RMSE (root mean squared error) values up to 1.246 mm/day, MBE (mean bias error) values up to −0.001 mm/day, and R<sup>2</sup> values up to 0.898. Besides, inland area results outperformed coastal area in most locations, arising the efficiency effects based on the distance to the sea (up to an improvement of 63.89% in terms of RMSE). Finally, machine learning (ML) models (especially MLP (multilayer perceptron)) notably outperformed simple linear regression estimations in the coastal sites, whereas in inland locations, the improvements were not such significant.https://www.mdpi.com/2073-4433/12/9/1158gap-fillingrainfall seriesmachine learningBayesian optimization
spellingShingle Juan Antonio Bellido-Jiménez
Javier Estévez Gualda
Amanda Penélope García-Marín
Assessing Machine Learning Models for Gap Filling Daily Rainfall Series in a Semiarid Region of Spain
Atmosphere
gap-filling
rainfall series
machine learning
Bayesian optimization
title Assessing Machine Learning Models for Gap Filling Daily Rainfall Series in a Semiarid Region of Spain
title_full Assessing Machine Learning Models for Gap Filling Daily Rainfall Series in a Semiarid Region of Spain
title_fullStr Assessing Machine Learning Models for Gap Filling Daily Rainfall Series in a Semiarid Region of Spain
title_full_unstemmed Assessing Machine Learning Models for Gap Filling Daily Rainfall Series in a Semiarid Region of Spain
title_short Assessing Machine Learning Models for Gap Filling Daily Rainfall Series in a Semiarid Region of Spain
title_sort assessing machine learning models for gap filling daily rainfall series in a semiarid region of spain
topic gap-filling
rainfall series
machine learning
Bayesian optimization
url https://www.mdpi.com/2073-4433/12/9/1158
work_keys_str_mv AT juanantoniobellidojimenez assessingmachinelearningmodelsforgapfillingdailyrainfallseriesinasemiaridregionofspain
AT javierestevezgualda assessingmachinelearningmodelsforgapfillingdailyrainfallseriesinasemiaridregionofspain
AT amandapenelopegarciamarin assessingmachinelearningmodelsforgapfillingdailyrainfallseriesinasemiaridregionofspain