Investigating Predictability of the TRHR Seasonal Precipitation at Long Lead Times Using a Generalized Regression Model with Regularization
Skillful long-lead climate forecast is of great importance in managing large water systems and can be made possible using teleconnections between regional climate and large-scale circulations. Recent innovations in machine learning provide powerful tools in exploring linear/nonlinear associations be...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2021-08-01
|
Series: | Frontiers in Earth Science |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/feart.2021.724599/full |
_version_ | 1818881392580231168 |
---|---|
author | Xiao Peng Tiejian Li John D. Albertson |
author_facet | Xiao Peng Tiejian Li John D. Albertson |
author_sort | Xiao Peng |
collection | DOAJ |
description | Skillful long-lead climate forecast is of great importance in managing large water systems and can be made possible using teleconnections between regional climate and large-scale circulations. Recent innovations in machine learning provide powerful tools in exploring linear/nonlinear associations between climate variables. However, while it is hard to give physical interpretation of the more complex models, the simple models can be vulnerable to over-fitting, especially when dealing with the highly “non-square” climate data. Here, as a compromise of interpretability and complexity, we proposed a regression model by coupling pooling and a generalized regression with regularization. Performance of the model is tested in estimating the Three-Rivers Headwater Region wet-season precipitation using the sea surface temperatures at lead times of 0–24 months. The model shows better predictive skill for certain long lead times when compared with some commonly used regression methods including the Ordinary Least Squares (OLS), Empirical Orthogonal Function (EOF), and Canonical Correlation Analysis (CCA) regressions. The high skill is found to relate to the persistent regional correlation patterns between the predictand precipitation and predictor SSTs as also confirmed by a correlation analysis. Furthermore, flexibility of the model is demonstrated using a multinomial regression model which shows good skill around the long lead time of 22 months. Consistent clusters of SSTs are found to contribute to both models. Two SST indices are defined based on the major clusters of predictors and are found to be significantly correlated with the predictand precipitation at corresponding lead times. In conclusion, the proposed regression model demonstrates great flexibility and advantages in dealing with collinearity while preserving simplicity and interpretability, and shows potential as a cheap preliminary analysis tool to guide further study using more complex models. |
first_indexed | 2024-12-19T15:01:08Z |
format | Article |
id | doaj.art-5cf078ee066b42b7bd83254914b98f73 |
institution | Directory Open Access Journal |
issn | 2296-6463 |
language | English |
last_indexed | 2024-12-19T15:01:08Z |
publishDate | 2021-08-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Earth Science |
spelling | doaj.art-5cf078ee066b42b7bd83254914b98f732022-12-21T20:16:34ZengFrontiers Media S.A.Frontiers in Earth Science2296-64632021-08-01910.3389/feart.2021.724599724599Investigating Predictability of the TRHR Seasonal Precipitation at Long Lead Times Using a Generalized Regression Model with RegularizationXiao Peng0Tiejian Li1John D. Albertson2School of Civil and Environmental Engineering, Cornell University, Ithaca, NY, United StatesDepartment of Hydraulic Engineering, Tsinghua University, Beijing, ChinaSchool of Civil and Environmental Engineering, Cornell University, Ithaca, NY, United StatesSkillful long-lead climate forecast is of great importance in managing large water systems and can be made possible using teleconnections between regional climate and large-scale circulations. Recent innovations in machine learning provide powerful tools in exploring linear/nonlinear associations between climate variables. However, while it is hard to give physical interpretation of the more complex models, the simple models can be vulnerable to over-fitting, especially when dealing with the highly “non-square” climate data. Here, as a compromise of interpretability and complexity, we proposed a regression model by coupling pooling and a generalized regression with regularization. Performance of the model is tested in estimating the Three-Rivers Headwater Region wet-season precipitation using the sea surface temperatures at lead times of 0–24 months. The model shows better predictive skill for certain long lead times when compared with some commonly used regression methods including the Ordinary Least Squares (OLS), Empirical Orthogonal Function (EOF), and Canonical Correlation Analysis (CCA) regressions. The high skill is found to relate to the persistent regional correlation patterns between the predictand precipitation and predictor SSTs as also confirmed by a correlation analysis. Furthermore, flexibility of the model is demonstrated using a multinomial regression model which shows good skill around the long lead time of 22 months. Consistent clusters of SSTs are found to contribute to both models. Two SST indices are defined based on the major clusters of predictors and are found to be significantly correlated with the predictand precipitation at corresponding lead times. In conclusion, the proposed regression model demonstrates great flexibility and advantages in dealing with collinearity while preserving simplicity and interpretability, and shows potential as a cheap preliminary analysis tool to guide further study using more complex models.https://www.frontiersin.org/articles/10.3389/feart.2021.724599/fullthe three-rivers headwater regionseasonal precipitation predictionteleconnectionpoolingelastic net regressionlogistic regression |
spellingShingle | Xiao Peng Tiejian Li John D. Albertson Investigating Predictability of the TRHR Seasonal Precipitation at Long Lead Times Using a Generalized Regression Model with Regularization Frontiers in Earth Science the three-rivers headwater region seasonal precipitation prediction teleconnection pooling elastic net regression logistic regression |
title | Investigating Predictability of the TRHR Seasonal Precipitation at Long Lead Times Using a Generalized Regression Model with Regularization |
title_full | Investigating Predictability of the TRHR Seasonal Precipitation at Long Lead Times Using a Generalized Regression Model with Regularization |
title_fullStr | Investigating Predictability of the TRHR Seasonal Precipitation at Long Lead Times Using a Generalized Regression Model with Regularization |
title_full_unstemmed | Investigating Predictability of the TRHR Seasonal Precipitation at Long Lead Times Using a Generalized Regression Model with Regularization |
title_short | Investigating Predictability of the TRHR Seasonal Precipitation at Long Lead Times Using a Generalized Regression Model with Regularization |
title_sort | investigating predictability of the trhr seasonal precipitation at long lead times using a generalized regression model with regularization |
topic | the three-rivers headwater region seasonal precipitation prediction teleconnection pooling elastic net regression logistic regression |
url | https://www.frontiersin.org/articles/10.3389/feart.2021.724599/full |
work_keys_str_mv | AT xiaopeng investigatingpredictabilityofthetrhrseasonalprecipitationatlongleadtimesusingageneralizedregressionmodelwithregularization AT tiejianli investigatingpredictabilityofthetrhrseasonalprecipitationatlongleadtimesusingageneralizedregressionmodelwithregularization AT johndalbertson investigatingpredictabilityofthetrhrseasonalprecipitationatlongleadtimesusingageneralizedregressionmodelwithregularization |