Assessing Machine Learning Models for Gap Filling Daily Rainfall Series in a Semiarid Region of Spain
The presence of missing data in hydrometeorological datasets is a common problem, usually due to sensor malfunction, deficiencies in records storage and transmission, or other recovery procedures issues. These missing values are the primary source of problems when analyzing and modeling their spatia...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-09-01
|
Series: | Atmosphere |
Subjects: | |
Online Access: | https://www.mdpi.com/2073-4433/12/9/1158 |
_version_ | 1797520236646236160 |
---|---|
author | Juan Antonio Bellido-Jiménez Javier Estévez Gualda Amanda Penélope García-Marín |
author_facet | Juan Antonio Bellido-Jiménez Javier Estévez Gualda Amanda Penélope García-Marín |
author_sort | Juan Antonio Bellido-Jiménez |
collection | DOAJ |
description | The presence of missing data in hydrometeorological datasets is a common problem, usually due to sensor malfunction, deficiencies in records storage and transmission, or other recovery procedures issues. These missing values are the primary source of problems when analyzing and modeling their spatial and temporal variability. Thus, accurate gap-filling techniques for rainfall time series are necessary to have complete datasets, which is crucial in studying climate change evolution. In this work, several machine learning models have been assessed to gap-fill rainfall data, using different approaches and locations in the semiarid region of Andalusia (Southern Spain). Based on the obtained results, the use of neighbor data, located within a 50 km radius, highly outperformed the rest of the assessed approaches, with RMSE (root mean squared error) values up to 1.246 mm/day, MBE (mean bias error) values up to −0.001 mm/day, and R<sup>2</sup> values up to 0.898. Besides, inland area results outperformed coastal area in most locations, arising the efficiency effects based on the distance to the sea (up to an improvement of 63.89% in terms of RMSE). Finally, machine learning (ML) models (especially MLP (multilayer perceptron)) notably outperformed simple linear regression estimations in the coastal sites, whereas in inland locations, the improvements were not such significant. |
first_indexed | 2024-03-10T07:53:56Z |
format | Article |
id | doaj.art-24025d4ff7a44b36b4abf67bddf9e1c4 |
institution | Directory Open Access Journal |
issn | 2073-4433 |
language | English |
last_indexed | 2024-03-10T07:53:56Z |
publishDate | 2021-09-01 |
publisher | MDPI AG |
record_format | Article |
series | Atmosphere |
spelling | doaj.art-24025d4ff7a44b36b4abf67bddf9e1c42023-11-22T12:00:07ZengMDPI AGAtmosphere2073-44332021-09-01129115810.3390/atmos12091158Assessing Machine Learning Models for Gap Filling Daily Rainfall Series in a Semiarid Region of SpainJuan Antonio Bellido-Jiménez0Javier Estévez Gualda1Amanda Penélope García-Marín2Projects Engineering Area, Department of Rural Engineering, University of Córdoba, 14071 Córdoba, SpainProjects Engineering Area, Department of Rural Engineering, University of Córdoba, 14071 Córdoba, SpainProjects Engineering Area, Department of Rural Engineering, University of Córdoba, 14071 Córdoba, SpainThe presence of missing data in hydrometeorological datasets is a common problem, usually due to sensor malfunction, deficiencies in records storage and transmission, or other recovery procedures issues. These missing values are the primary source of problems when analyzing and modeling their spatial and temporal variability. Thus, accurate gap-filling techniques for rainfall time series are necessary to have complete datasets, which is crucial in studying climate change evolution. In this work, several machine learning models have been assessed to gap-fill rainfall data, using different approaches and locations in the semiarid region of Andalusia (Southern Spain). Based on the obtained results, the use of neighbor data, located within a 50 km radius, highly outperformed the rest of the assessed approaches, with RMSE (root mean squared error) values up to 1.246 mm/day, MBE (mean bias error) values up to −0.001 mm/day, and R<sup>2</sup> values up to 0.898. Besides, inland area results outperformed coastal area in most locations, arising the efficiency effects based on the distance to the sea (up to an improvement of 63.89% in terms of RMSE). Finally, machine learning (ML) models (especially MLP (multilayer perceptron)) notably outperformed simple linear regression estimations in the coastal sites, whereas in inland locations, the improvements were not such significant.https://www.mdpi.com/2073-4433/12/9/1158gap-fillingrainfall seriesmachine learningBayesian optimization |
spellingShingle | Juan Antonio Bellido-Jiménez Javier Estévez Gualda Amanda Penélope García-Marín Assessing Machine Learning Models for Gap Filling Daily Rainfall Series in a Semiarid Region of Spain Atmosphere gap-filling rainfall series machine learning Bayesian optimization |
title | Assessing Machine Learning Models for Gap Filling Daily Rainfall Series in a Semiarid Region of Spain |
title_full | Assessing Machine Learning Models for Gap Filling Daily Rainfall Series in a Semiarid Region of Spain |
title_fullStr | Assessing Machine Learning Models for Gap Filling Daily Rainfall Series in a Semiarid Region of Spain |
title_full_unstemmed | Assessing Machine Learning Models for Gap Filling Daily Rainfall Series in a Semiarid Region of Spain |
title_short | Assessing Machine Learning Models for Gap Filling Daily Rainfall Series in a Semiarid Region of Spain |
title_sort | assessing machine learning models for gap filling daily rainfall series in a semiarid region of spain |
topic | gap-filling rainfall series machine learning Bayesian optimization |
url | https://www.mdpi.com/2073-4433/12/9/1158 |
work_keys_str_mv | AT juanantoniobellidojimenez assessingmachinelearningmodelsforgapfillingdailyrainfallseriesinasemiaridregionofspain AT javierestevezgualda assessingmachinelearningmodelsforgapfillingdailyrainfallseriesinasemiaridregionofspain AT amandapenelopegarciamarin assessingmachinelearningmodelsforgapfillingdailyrainfallseriesinasemiaridregionofspain |