Comparison of Tree-Based Ensemble Algorithms for Merging Satellite and Earth-Observed Precipitation Data at the Daily Time Scale
Merging satellite products and ground-based measurements is often required for obtaining precipitation datasets that simultaneously cover large regions with high density and are more accurate than pure satellite precipitation products. Machine and statistical learning regression algorithms are regul...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-02-01
|
Series: | Hydrology |
Subjects: | |
Online Access: | https://www.mdpi.com/2306-5338/10/2/50 |
_version_ | 1827757141589491712 |
---|---|
author | Georgia Papacharalampous Hristos Tyralis Anastasios Doulamis Nikolaos Doulamis |
author_facet | Georgia Papacharalampous Hristos Tyralis Anastasios Doulamis Nikolaos Doulamis |
author_sort | Georgia Papacharalampous |
collection | DOAJ |
description | Merging satellite products and ground-based measurements is often required for obtaining precipitation datasets that simultaneously cover large regions with high density and are more accurate than pure satellite precipitation products. Machine and statistical learning regression algorithms are regularly utilized in this endeavor. At the same time, tree-based ensemble algorithms are adopted in various fields for solving regression problems with high accuracy and low computational costs. Still, information on which tree-based ensemble algorithm to select for correcting satellite precipitation products for the contiguous United States (US) at the daily time scale is missing from the literature. In this study, we worked towards filling this methodological gap by conducting an extensive comparison between three algorithms of the category of interest, specifically between random forests, gradient boosting machines (gbm) and extreme gradient boosting (XGBoost). We used daily data from the PERSIANN (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks) and the IMERG (Integrated Multi-satellitE Retrievals for GPM) gridded datasets. We also used earth-observed precipitation data from the Global Historical Climatology Network daily (GHCNd) database. The experiments referred to the entire contiguous US and additionally included the application of the linear regression algorithm for benchmarking purposes. The results suggest that XGBoost is the best-performing tree-based ensemble algorithm among those compared. Indeed, the mean relative improvements that it provided with respect to linear regression (for the case that the latter algorithm was run with the same predictors as XGBoost) are equal to 52.66%, 56.26% and 64.55% (for three different predictor sets), while the respective values are 37.57%, 53.99% and 54.39% for random forests, and 34.72%, 47.99% and 62.61% for gbm. Lastly, the results suggest that IMERG is more useful than PERSIANN in the context investigated. |
first_indexed | 2024-03-11T08:44:46Z |
format | Article |
id | doaj.art-3cad235542864f8cb5199a70708c0c92 |
institution | Directory Open Access Journal |
issn | 2306-5338 |
language | English |
last_indexed | 2024-03-11T08:44:46Z |
publishDate | 2023-02-01 |
publisher | MDPI AG |
record_format | Article |
series | Hydrology |
spelling | doaj.art-3cad235542864f8cb5199a70708c0c922023-11-16T20:51:49ZengMDPI AGHydrology2306-53382023-02-011025010.3390/hydrology10020050Comparison of Tree-Based Ensemble Algorithms for Merging Satellite and Earth-Observed Precipitation Data at the Daily Time ScaleGeorgia Papacharalampous0Hristos Tyralis1Anastasios Doulamis2Nikolaos Doulamis3Department of Topography, School of Rural, Surveying and Geoinformatics Engineering, National Technical University of Athens, Iroon Polytechniou 5, 157 80 Zografou, GreeceDepartment of Topography, School of Rural, Surveying and Geoinformatics Engineering, National Technical University of Athens, Iroon Polytechniou 5, 157 80 Zografou, GreeceDepartment of Topography, School of Rural, Surveying and Geoinformatics Engineering, National Technical University of Athens, Iroon Polytechniou 5, 157 80 Zografou, GreeceDepartment of Topography, School of Rural, Surveying and Geoinformatics Engineering, National Technical University of Athens, Iroon Polytechniou 5, 157 80 Zografou, GreeceMerging satellite products and ground-based measurements is often required for obtaining precipitation datasets that simultaneously cover large regions with high density and are more accurate than pure satellite precipitation products. Machine and statistical learning regression algorithms are regularly utilized in this endeavor. At the same time, tree-based ensemble algorithms are adopted in various fields for solving regression problems with high accuracy and low computational costs. Still, information on which tree-based ensemble algorithm to select for correcting satellite precipitation products for the contiguous United States (US) at the daily time scale is missing from the literature. In this study, we worked towards filling this methodological gap by conducting an extensive comparison between three algorithms of the category of interest, specifically between random forests, gradient boosting machines (gbm) and extreme gradient boosting (XGBoost). We used daily data from the PERSIANN (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks) and the IMERG (Integrated Multi-satellitE Retrievals for GPM) gridded datasets. We also used earth-observed precipitation data from the Global Historical Climatology Network daily (GHCNd) database. The experiments referred to the entire contiguous US and additionally included the application of the linear regression algorithm for benchmarking purposes. The results suggest that XGBoost is the best-performing tree-based ensemble algorithm among those compared. Indeed, the mean relative improvements that it provided with respect to linear regression (for the case that the latter algorithm was run with the same predictors as XGBoost) are equal to 52.66%, 56.26% and 64.55% (for three different predictor sets), while the respective values are 37.57%, 53.99% and 54.39% for random forests, and 34.72%, 47.99% and 62.61% for gbm. Lastly, the results suggest that IMERG is more useful than PERSIANN in the context investigated.https://www.mdpi.com/2306-5338/10/2/50contiguous USgradient boosting machinesIMERGmachine learningPERSIANNrandom forests |
spellingShingle | Georgia Papacharalampous Hristos Tyralis Anastasios Doulamis Nikolaos Doulamis Comparison of Tree-Based Ensemble Algorithms for Merging Satellite and Earth-Observed Precipitation Data at the Daily Time Scale Hydrology contiguous US gradient boosting machines IMERG machine learning PERSIANN random forests |
title | Comparison of Tree-Based Ensemble Algorithms for Merging Satellite and Earth-Observed Precipitation Data at the Daily Time Scale |
title_full | Comparison of Tree-Based Ensemble Algorithms for Merging Satellite and Earth-Observed Precipitation Data at the Daily Time Scale |
title_fullStr | Comparison of Tree-Based Ensemble Algorithms for Merging Satellite and Earth-Observed Precipitation Data at the Daily Time Scale |
title_full_unstemmed | Comparison of Tree-Based Ensemble Algorithms for Merging Satellite and Earth-Observed Precipitation Data at the Daily Time Scale |
title_short | Comparison of Tree-Based Ensemble Algorithms for Merging Satellite and Earth-Observed Precipitation Data at the Daily Time Scale |
title_sort | comparison of tree based ensemble algorithms for merging satellite and earth observed precipitation data at the daily time scale |
topic | contiguous US gradient boosting machines IMERG machine learning PERSIANN random forests |
url | https://www.mdpi.com/2306-5338/10/2/50 |
work_keys_str_mv | AT georgiapapacharalampous comparisonoftreebasedensemblealgorithmsformergingsatelliteandearthobservedprecipitationdataatthedailytimescale AT hristostyralis comparisonoftreebasedensemblealgorithmsformergingsatelliteandearthobservedprecipitationdataatthedailytimescale AT anastasiosdoulamis comparisonoftreebasedensemblealgorithmsformergingsatelliteandearthobservedprecipitationdataatthedailytimescale AT nikolaosdoulamis comparisonoftreebasedensemblealgorithmsformergingsatelliteandearthobservedprecipitationdataatthedailytimescale |