Investigating Predictability of the TRHR Seasonal Precipitation at Long Lead Times Using a Generalized Regression Model with Regularization

Skillful long-lead climate forecast is of great importance in managing large water systems and can be made possible using teleconnections between regional climate and large-scale circulations. Recent innovations in machine learning provide powerful tools in exploring linear/nonlinear associations be...

Full description

Bibliographic Details
Main Authors: Xiao Peng, Tiejian Li, John D. Albertson
Format: Article
Language:English
Published: Frontiers Media S.A. 2021-08-01
Series:Frontiers in Earth Science
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/feart.2021.724599/full
_version_ 1818881392580231168
author Xiao Peng
Tiejian Li
John D. Albertson
author_facet Xiao Peng
Tiejian Li
John D. Albertson
author_sort Xiao Peng
collection DOAJ
description Skillful long-lead climate forecast is of great importance in managing large water systems and can be made possible using teleconnections between regional climate and large-scale circulations. Recent innovations in machine learning provide powerful tools in exploring linear/nonlinear associations between climate variables. However, while it is hard to give physical interpretation of the more complex models, the simple models can be vulnerable to over-fitting, especially when dealing with the highly “non-square” climate data. Here, as a compromise of interpretability and complexity, we proposed a regression model by coupling pooling and a generalized regression with regularization. Performance of the model is tested in estimating the Three-Rivers Headwater Region wet-season precipitation using the sea surface temperatures at lead times of 0–24 months. The model shows better predictive skill for certain long lead times when compared with some commonly used regression methods including the Ordinary Least Squares (OLS), Empirical Orthogonal Function (EOF), and Canonical Correlation Analysis (CCA) regressions. The high skill is found to relate to the persistent regional correlation patterns between the predictand precipitation and predictor SSTs as also confirmed by a correlation analysis. Furthermore, flexibility of the model is demonstrated using a multinomial regression model which shows good skill around the long lead time of 22 months. Consistent clusters of SSTs are found to contribute to both models. Two SST indices are defined based on the major clusters of predictors and are found to be significantly correlated with the predictand precipitation at corresponding lead times. In conclusion, the proposed regression model demonstrates great flexibility and advantages in dealing with collinearity while preserving simplicity and interpretability, and shows potential as a cheap preliminary analysis tool to guide further study using more complex models.
first_indexed 2024-12-19T15:01:08Z
format Article
id doaj.art-5cf078ee066b42b7bd83254914b98f73
institution Directory Open Access Journal
issn 2296-6463
language English
last_indexed 2024-12-19T15:01:08Z
publishDate 2021-08-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Earth Science
spelling doaj.art-5cf078ee066b42b7bd83254914b98f732022-12-21T20:16:34ZengFrontiers Media S.A.Frontiers in Earth Science2296-64632021-08-01910.3389/feart.2021.724599724599Investigating Predictability of the TRHR Seasonal Precipitation at Long Lead Times Using a Generalized Regression Model with RegularizationXiao Peng0Tiejian Li1John D. Albertson2School of Civil and Environmental Engineering, Cornell University, Ithaca, NY, United StatesDepartment of Hydraulic Engineering, Tsinghua University, Beijing, ChinaSchool of Civil and Environmental Engineering, Cornell University, Ithaca, NY, United StatesSkillful long-lead climate forecast is of great importance in managing large water systems and can be made possible using teleconnections between regional climate and large-scale circulations. Recent innovations in machine learning provide powerful tools in exploring linear/nonlinear associations between climate variables. However, while it is hard to give physical interpretation of the more complex models, the simple models can be vulnerable to over-fitting, especially when dealing with the highly “non-square” climate data. Here, as a compromise of interpretability and complexity, we proposed a regression model by coupling pooling and a generalized regression with regularization. Performance of the model is tested in estimating the Three-Rivers Headwater Region wet-season precipitation using the sea surface temperatures at lead times of 0–24 months. The model shows better predictive skill for certain long lead times when compared with some commonly used regression methods including the Ordinary Least Squares (OLS), Empirical Orthogonal Function (EOF), and Canonical Correlation Analysis (CCA) regressions. The high skill is found to relate to the persistent regional correlation patterns between the predictand precipitation and predictor SSTs as also confirmed by a correlation analysis. Furthermore, flexibility of the model is demonstrated using a multinomial regression model which shows good skill around the long lead time of 22 months. Consistent clusters of SSTs are found to contribute to both models. Two SST indices are defined based on the major clusters of predictors and are found to be significantly correlated with the predictand precipitation at corresponding lead times. In conclusion, the proposed regression model demonstrates great flexibility and advantages in dealing with collinearity while preserving simplicity and interpretability, and shows potential as a cheap preliminary analysis tool to guide further study using more complex models.https://www.frontiersin.org/articles/10.3389/feart.2021.724599/fullthe three-rivers headwater regionseasonal precipitation predictionteleconnectionpoolingelastic net regressionlogistic regression
spellingShingle Xiao Peng
Tiejian Li
John D. Albertson
Investigating Predictability of the TRHR Seasonal Precipitation at Long Lead Times Using a Generalized Regression Model with Regularization
Frontiers in Earth Science
the three-rivers headwater region
seasonal precipitation prediction
teleconnection
pooling
elastic net regression
logistic regression
title Investigating Predictability of the TRHR Seasonal Precipitation at Long Lead Times Using a Generalized Regression Model with Regularization
title_full Investigating Predictability of the TRHR Seasonal Precipitation at Long Lead Times Using a Generalized Regression Model with Regularization
title_fullStr Investigating Predictability of the TRHR Seasonal Precipitation at Long Lead Times Using a Generalized Regression Model with Regularization
title_full_unstemmed Investigating Predictability of the TRHR Seasonal Precipitation at Long Lead Times Using a Generalized Regression Model with Regularization
title_short Investigating Predictability of the TRHR Seasonal Precipitation at Long Lead Times Using a Generalized Regression Model with Regularization
title_sort investigating predictability of the trhr seasonal precipitation at long lead times using a generalized regression model with regularization
topic the three-rivers headwater region
seasonal precipitation prediction
teleconnection
pooling
elastic net regression
logistic regression
url https://www.frontiersin.org/articles/10.3389/feart.2021.724599/full
work_keys_str_mv AT xiaopeng investigatingpredictabilityofthetrhrseasonalprecipitationatlongleadtimesusingageneralizedregressionmodelwithregularization
AT tiejianli investigatingpredictabilityofthetrhrseasonalprecipitationatlongleadtimesusingageneralizedregressionmodelwithregularization
AT johndalbertson investigatingpredictabilityofthetrhrseasonalprecipitationatlongleadtimesusingageneralizedregressionmodelwithregularization