Influence of Random Forest Hyperparameterization on Short-Term Runoff Forecasting in an Andean Mountain Catchment

The Random Forest (RF) algorithm, a decision-tree-based technique, has become a promising approach for applications addressing runoff forecasting in remote areas. This machine learning approach can overcome the limitations of scarce spatio-temporal data and physical parameters needed for process-bas...

Full description

Bibliographic Details
Main Authors: Pablo Contreras, Johanna Orellana-Alvear, Paul Muñoz, Jörg Bendix, Rolando Célleri
Format: Article
Language:English
Published: MDPI AG 2021-02-01
Series:Atmosphere
Subjects:
Online Access:https://www.mdpi.com/2073-4433/12/2/238
_version_ 1827601374586601472
author Pablo Contreras
Johanna Orellana-Alvear
Paul Muñoz
Jörg Bendix
Rolando Célleri
author_facet Pablo Contreras
Johanna Orellana-Alvear
Paul Muñoz
Jörg Bendix
Rolando Célleri
author_sort Pablo Contreras
collection DOAJ
description The Random Forest (RF) algorithm, a decision-tree-based technique, has become a promising approach for applications addressing runoff forecasting in remote areas. This machine learning approach can overcome the limitations of scarce spatio-temporal data and physical parameters needed for process-based hydrological models. However, the influence of RF hyperparameters is still uncertain and needs to be explored. Therefore, the aim of this study is to analyze the sensitivity of RF runoff forecasting models of varying lead time to the hyperparameters of the algorithm. For this, models were trained by using (a) default and (b) extensive hyperparameter combinations through a grid-search approach that allow reaching the optimal set. Model performances were assessed based on the R<sup>2</sup>, %Bias, and RMSE metrics. We found that: (i) The most influencing hyperparameter is the number of trees in the forest, however the combination of the depth of the tree and the number of features hyperparameters produced the highest variability-instability on the models. (ii) Hyperparameter optimization significantly improved model performance for higher lead times (12- and 24-h). For instance, the performance of the 12-h forecasting model under default RF hyperparameters improved to R<sup>2</sup> = 0.41 after optimization (gain of 0.17). However, for short lead times (4-h) there was no significant model improvement (0.69 < R<sup>2</sup> < 0.70). (iii) There is a range of values for each hyperparameter in which the performance of the model is not significantly affected but remains close to the optimal. Thus, a compromise between hyperparameter interactions (i.e., their values) can produce similar high model performances. Model improvements after optimization can be explained from a hydrological point of view, the generalization ability for lead times larger than the concentration time of the catchment tend to rely more on hyperparameterization than in what they can learn from the input data. This insight can help in the development of operational early warning systems.
first_indexed 2024-03-09T04:53:27Z
format Article
id doaj.art-d5a27fa7569c4cc88e9dcf4ac3c7c642
institution Directory Open Access Journal
issn 2073-4433
language English
last_indexed 2024-03-09T04:53:27Z
publishDate 2021-02-01
publisher MDPI AG
record_format Article
series Atmosphere
spelling doaj.art-d5a27fa7569c4cc88e9dcf4ac3c7c6422023-12-03T13:08:18ZengMDPI AGAtmosphere2073-44332021-02-0112223810.3390/atmos12020238Influence of Random Forest Hyperparameterization on Short-Term Runoff Forecasting in an Andean Mountain CatchmentPablo Contreras0Johanna Orellana-Alvear1Paul Muñoz2Jörg Bendix3Rolando Célleri4Departamento de Recursos Hídricos y Ciencias Ambientales, Universidad de Cuenca, Cuenca EC10207, EcuadorDepartamento de Recursos Hídricos y Ciencias Ambientales, Universidad de Cuenca, Cuenca EC10207, EcuadorDepartamento de Recursos Hídricos y Ciencias Ambientales, Universidad de Cuenca, Cuenca EC10207, EcuadorLaboratory for Climatology and Remote Sensing (LCRS), Faculty of Geography, University of Marburg, D-035032 Marburg, GermanyDepartamento de Recursos Hídricos y Ciencias Ambientales, Universidad de Cuenca, Cuenca EC10207, EcuadorThe Random Forest (RF) algorithm, a decision-tree-based technique, has become a promising approach for applications addressing runoff forecasting in remote areas. This machine learning approach can overcome the limitations of scarce spatio-temporal data and physical parameters needed for process-based hydrological models. However, the influence of RF hyperparameters is still uncertain and needs to be explored. Therefore, the aim of this study is to analyze the sensitivity of RF runoff forecasting models of varying lead time to the hyperparameters of the algorithm. For this, models were trained by using (a) default and (b) extensive hyperparameter combinations through a grid-search approach that allow reaching the optimal set. Model performances were assessed based on the R<sup>2</sup>, %Bias, and RMSE metrics. We found that: (i) The most influencing hyperparameter is the number of trees in the forest, however the combination of the depth of the tree and the number of features hyperparameters produced the highest variability-instability on the models. (ii) Hyperparameter optimization significantly improved model performance for higher lead times (12- and 24-h). For instance, the performance of the 12-h forecasting model under default RF hyperparameters improved to R<sup>2</sup> = 0.41 after optimization (gain of 0.17). However, for short lead times (4-h) there was no significant model improvement (0.69 < R<sup>2</sup> < 0.70). (iii) There is a range of values for each hyperparameter in which the performance of the model is not significantly affected but remains close to the optimal. Thus, a compromise between hyperparameter interactions (i.e., their values) can produce similar high model performances. Model improvements after optimization can be explained from a hydrological point of view, the generalization ability for lead times larger than the concentration time of the catchment tend to rely more on hyperparameterization than in what they can learn from the input data. This insight can help in the development of operational early warning systems.https://www.mdpi.com/2073-4433/12/2/238tropical Andesrandom forestmachine learningoptimal hyperparametersrunoff forecasting
spellingShingle Pablo Contreras
Johanna Orellana-Alvear
Paul Muñoz
Jörg Bendix
Rolando Célleri
Influence of Random Forest Hyperparameterization on Short-Term Runoff Forecasting in an Andean Mountain Catchment
Atmosphere
tropical Andes
random forest
machine learning
optimal hyperparameters
runoff forecasting
title Influence of Random Forest Hyperparameterization on Short-Term Runoff Forecasting in an Andean Mountain Catchment
title_full Influence of Random Forest Hyperparameterization on Short-Term Runoff Forecasting in an Andean Mountain Catchment
title_fullStr Influence of Random Forest Hyperparameterization on Short-Term Runoff Forecasting in an Andean Mountain Catchment
title_full_unstemmed Influence of Random Forest Hyperparameterization on Short-Term Runoff Forecasting in an Andean Mountain Catchment
title_short Influence of Random Forest Hyperparameterization on Short-Term Runoff Forecasting in an Andean Mountain Catchment
title_sort influence of random forest hyperparameterization on short term runoff forecasting in an andean mountain catchment
topic tropical Andes
random forest
machine learning
optimal hyperparameters
runoff forecasting
url https://www.mdpi.com/2073-4433/12/2/238
work_keys_str_mv AT pablocontreras influenceofrandomforesthyperparameterizationonshorttermrunoffforecastinginanandeanmountaincatchment
AT johannaorellanaalvear influenceofrandomforesthyperparameterizationonshorttermrunoffforecastinginanandeanmountaincatchment
AT paulmunoz influenceofrandomforesthyperparameterizationonshorttermrunoffforecastinginanandeanmountaincatchment
AT jorgbendix influenceofrandomforesthyperparameterizationonshorttermrunoffforecastinginanandeanmountaincatchment
AT rolandocelleri influenceofrandomforesthyperparameterizationonshorttermrunoffforecastinginanandeanmountaincatchment