Comparison of Tree-Based Ensemble Algorithms for Merging Satellite and Earth-Observed Precipitation Data at the Daily Time Scale

Merging satellite products and ground-based measurements is often required for obtaining precipitation datasets that simultaneously cover large regions with high density and are more accurate than pure satellite precipitation products. Machine and statistical learning regression algorithms are regul...

Full description

Bibliographic Details
Main Authors: Georgia Papacharalampous, Hristos Tyralis, Anastasios Doulamis, Nikolaos Doulamis
Format: Article
Language:English
Published: MDPI AG 2023-02-01
Series:Hydrology
Subjects:
Online Access:https://www.mdpi.com/2306-5338/10/2/50
_version_ 1827757141589491712
author Georgia Papacharalampous
Hristos Tyralis
Anastasios Doulamis
Nikolaos Doulamis
author_facet Georgia Papacharalampous
Hristos Tyralis
Anastasios Doulamis
Nikolaos Doulamis
author_sort Georgia Papacharalampous
collection DOAJ
description Merging satellite products and ground-based measurements is often required for obtaining precipitation datasets that simultaneously cover large regions with high density and are more accurate than pure satellite precipitation products. Machine and statistical learning regression algorithms are regularly utilized in this endeavor. At the same time, tree-based ensemble algorithms are adopted in various fields for solving regression problems with high accuracy and low computational costs. Still, information on which tree-based ensemble algorithm to select for correcting satellite precipitation products for the contiguous United States (US) at the daily time scale is missing from the literature. In this study, we worked towards filling this methodological gap by conducting an extensive comparison between three algorithms of the category of interest, specifically between random forests, gradient boosting machines (gbm) and extreme gradient boosting (XGBoost). We used daily data from the PERSIANN (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks) and the IMERG (Integrated Multi-satellitE Retrievals for GPM) gridded datasets. We also used earth-observed precipitation data from the Global Historical Climatology Network daily (GHCNd) database. The experiments referred to the entire contiguous US and additionally included the application of the linear regression algorithm for benchmarking purposes. The results suggest that XGBoost is the best-performing tree-based ensemble algorithm among those compared. Indeed, the mean relative improvements that it provided with respect to linear regression (for the case that the latter algorithm was run with the same predictors as XGBoost) are equal to 52.66%, 56.26% and 64.55% (for three different predictor sets), while the respective values are 37.57%, 53.99% and 54.39% for random forests, and 34.72%, 47.99% and 62.61% for gbm. Lastly, the results suggest that IMERG is more useful than PERSIANN in the context investigated.
first_indexed 2024-03-11T08:44:46Z
format Article
id doaj.art-3cad235542864f8cb5199a70708c0c92
institution Directory Open Access Journal
issn 2306-5338
language English
last_indexed 2024-03-11T08:44:46Z
publishDate 2023-02-01
publisher MDPI AG
record_format Article
series Hydrology
spelling doaj.art-3cad235542864f8cb5199a70708c0c922023-11-16T20:51:49ZengMDPI AGHydrology2306-53382023-02-011025010.3390/hydrology10020050Comparison of Tree-Based Ensemble Algorithms for Merging Satellite and Earth-Observed Precipitation Data at the Daily Time ScaleGeorgia Papacharalampous0Hristos Tyralis1Anastasios Doulamis2Nikolaos Doulamis3Department of Topography, School of Rural, Surveying and Geoinformatics Engineering, National Technical University of Athens, Iroon Polytechniou 5, 157 80 Zografou, GreeceDepartment of Topography, School of Rural, Surveying and Geoinformatics Engineering, National Technical University of Athens, Iroon Polytechniou 5, 157 80 Zografou, GreeceDepartment of Topography, School of Rural, Surveying and Geoinformatics Engineering, National Technical University of Athens, Iroon Polytechniou 5, 157 80 Zografou, GreeceDepartment of Topography, School of Rural, Surveying and Geoinformatics Engineering, National Technical University of Athens, Iroon Polytechniou 5, 157 80 Zografou, GreeceMerging satellite products and ground-based measurements is often required for obtaining precipitation datasets that simultaneously cover large regions with high density and are more accurate than pure satellite precipitation products. Machine and statistical learning regression algorithms are regularly utilized in this endeavor. At the same time, tree-based ensemble algorithms are adopted in various fields for solving regression problems with high accuracy and low computational costs. Still, information on which tree-based ensemble algorithm to select for correcting satellite precipitation products for the contiguous United States (US) at the daily time scale is missing from the literature. In this study, we worked towards filling this methodological gap by conducting an extensive comparison between three algorithms of the category of interest, specifically between random forests, gradient boosting machines (gbm) and extreme gradient boosting (XGBoost). We used daily data from the PERSIANN (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks) and the IMERG (Integrated Multi-satellitE Retrievals for GPM) gridded datasets. We also used earth-observed precipitation data from the Global Historical Climatology Network daily (GHCNd) database. The experiments referred to the entire contiguous US and additionally included the application of the linear regression algorithm for benchmarking purposes. The results suggest that XGBoost is the best-performing tree-based ensemble algorithm among those compared. Indeed, the mean relative improvements that it provided with respect to linear regression (for the case that the latter algorithm was run with the same predictors as XGBoost) are equal to 52.66%, 56.26% and 64.55% (for three different predictor sets), while the respective values are 37.57%, 53.99% and 54.39% for random forests, and 34.72%, 47.99% and 62.61% for gbm. Lastly, the results suggest that IMERG is more useful than PERSIANN in the context investigated.https://www.mdpi.com/2306-5338/10/2/50contiguous USgradient boosting machinesIMERGmachine learningPERSIANNrandom forests
spellingShingle Georgia Papacharalampous
Hristos Tyralis
Anastasios Doulamis
Nikolaos Doulamis
Comparison of Tree-Based Ensemble Algorithms for Merging Satellite and Earth-Observed Precipitation Data at the Daily Time Scale
Hydrology
contiguous US
gradient boosting machines
IMERG
machine learning
PERSIANN
random forests
title Comparison of Tree-Based Ensemble Algorithms for Merging Satellite and Earth-Observed Precipitation Data at the Daily Time Scale
title_full Comparison of Tree-Based Ensemble Algorithms for Merging Satellite and Earth-Observed Precipitation Data at the Daily Time Scale
title_fullStr Comparison of Tree-Based Ensemble Algorithms for Merging Satellite and Earth-Observed Precipitation Data at the Daily Time Scale
title_full_unstemmed Comparison of Tree-Based Ensemble Algorithms for Merging Satellite and Earth-Observed Precipitation Data at the Daily Time Scale
title_short Comparison of Tree-Based Ensemble Algorithms for Merging Satellite and Earth-Observed Precipitation Data at the Daily Time Scale
title_sort comparison of tree based ensemble algorithms for merging satellite and earth observed precipitation data at the daily time scale
topic contiguous US
gradient boosting machines
IMERG
machine learning
PERSIANN
random forests
url https://www.mdpi.com/2306-5338/10/2/50
work_keys_str_mv AT georgiapapacharalampous comparisonoftreebasedensemblealgorithmsformergingsatelliteandearthobservedprecipitationdataatthedailytimescale
AT hristostyralis comparisonoftreebasedensemblealgorithmsformergingsatelliteandearthobservedprecipitationdataatthedailytimescale
AT anastasiosdoulamis comparisonoftreebasedensemblealgorithmsformergingsatelliteandearthobservedprecipitationdataatthedailytimescale
AT nikolaosdoulamis comparisonoftreebasedensemblealgorithmsformergingsatelliteandearthobservedprecipitationdataatthedailytimescale