A New Formula for Faster Computation of the K-Fold Cross-Validation and Good Regularisation Parameter Values in Ridge Regression
In the present paper, we prove a new theorem, resulting in an update formula for linear regression model residuals calculating the exact k-fold cross-validation residuals for any choice of cross-validation strategy without model refitting. The required matrix inversions are limited by the cross-vali...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2024-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10411898/ |
_version_ | 1797323937995030528 |
---|---|
author | Kristian Hovde Liland Joakim Skogholt Ulf Geir Indahl |
author_facet | Kristian Hovde Liland Joakim Skogholt Ulf Geir Indahl |
author_sort | Kristian Hovde Liland |
collection | DOAJ |
description | In the present paper, we prove a new theorem, resulting in an update formula for linear regression model residuals calculating the exact k-fold cross-validation residuals for any choice of cross-validation strategy without model refitting. The required matrix inversions are limited by the cross-validation segment sizes and can be executed with high efficiency in parallel. The well-known formula for leave-one-out cross-validation follows as a special case of the theorem. In situations where the cross-validation segments consist of small groups of repeated measurements, we suggest a heuristic strategy for fast serial approximations of the cross-validated residuals and associated Predicted Residual Sum of Squares (<inline-formula> <tex-math notation="LaTeX">$PRESS$ </tex-math></inline-formula>) statistic. We also suggest strategies for efficient estimation of the minimum <inline-formula> <tex-math notation="LaTeX">$PRESS$ </tex-math></inline-formula> value and full <inline-formula> <tex-math notation="LaTeX">$PRESS$ </tex-math></inline-formula> function over a selected interval of regularisation values. The computational effectiveness of the parameter selection for Ridge- and Tikhonov regression modelling resulting from our theoretical findings and heuristic arguments is demonstrated in several applications with real and highly multivariate datasets. |
first_indexed | 2024-03-08T05:36:12Z |
format | Article |
id | doaj.art-b6b3ec69fc6146ecae5df4207c269904 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-03-08T05:36:12Z |
publishDate | 2024-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-b6b3ec69fc6146ecae5df4207c2699042024-02-06T00:00:50ZengIEEEIEEE Access2169-35362024-01-0112173491736810.1109/ACCESS.2024.335709710411898A New Formula for Faster Computation of the K-Fold Cross-Validation and Good Regularisation Parameter Values in Ridge RegressionKristian Hovde Liland0https://orcid.org/0000-0001-6468-9423Joakim Skogholt1https://orcid.org/0000-0001-8511-993XUlf Geir Indahl2https://orcid.org/0000-0002-3236-463XFaculty of Science and Technology, Norwegian University of Life Sciences, Ås, NorwayFaculty of Science and Technology, Norwegian University of Life Sciences, Ås, NorwayFaculty of Science and Technology, Norwegian University of Life Sciences, Ås, NorwayIn the present paper, we prove a new theorem, resulting in an update formula for linear regression model residuals calculating the exact k-fold cross-validation residuals for any choice of cross-validation strategy without model refitting. The required matrix inversions are limited by the cross-validation segment sizes and can be executed with high efficiency in parallel. The well-known formula for leave-one-out cross-validation follows as a special case of the theorem. In situations where the cross-validation segments consist of small groups of repeated measurements, we suggest a heuristic strategy for fast serial approximations of the cross-validated residuals and associated Predicted Residual Sum of Squares (<inline-formula> <tex-math notation="LaTeX">$PRESS$ </tex-math></inline-formula>) statistic. We also suggest strategies for efficient estimation of the minimum <inline-formula> <tex-math notation="LaTeX">$PRESS$ </tex-math></inline-formula> value and full <inline-formula> <tex-math notation="LaTeX">$PRESS$ </tex-math></inline-formula> function over a selected interval of regularisation values. The computational effectiveness of the parameter selection for Ridge- and Tikhonov regression modelling resulting from our theoretical findings and heuristic arguments is demonstrated in several applications with real and highly multivariate datasets.https://ieeexplore.ieee.org/document/10411898/Cross-validationGCVPRESS statisticridge regressionSVDTikhonov regularisation |
spellingShingle | Kristian Hovde Liland Joakim Skogholt Ulf Geir Indahl A New Formula for Faster Computation of the K-Fold Cross-Validation and Good Regularisation Parameter Values in Ridge Regression IEEE Access Cross-validation GCV PRESS statistic ridge regression SVD Tikhonov regularisation |
title | A New Formula for Faster Computation of the K-Fold Cross-Validation and Good Regularisation Parameter Values in Ridge Regression |
title_full | A New Formula for Faster Computation of the K-Fold Cross-Validation and Good Regularisation Parameter Values in Ridge Regression |
title_fullStr | A New Formula for Faster Computation of the K-Fold Cross-Validation and Good Regularisation Parameter Values in Ridge Regression |
title_full_unstemmed | A New Formula for Faster Computation of the K-Fold Cross-Validation and Good Regularisation Parameter Values in Ridge Regression |
title_short | A New Formula for Faster Computation of the K-Fold Cross-Validation and Good Regularisation Parameter Values in Ridge Regression |
title_sort | new formula for faster computation of the k fold cross validation and good regularisation parameter values in ridge regression |
topic | Cross-validation GCV PRESS statistic ridge regression SVD Tikhonov regularisation |
url | https://ieeexplore.ieee.org/document/10411898/ |
work_keys_str_mv | AT kristianhovdeliland anewformulaforfastercomputationofthekfoldcrossvalidationandgoodregularisationparametervaluesinridgeregression AT joakimskogholt anewformulaforfastercomputationofthekfoldcrossvalidationandgoodregularisationparametervaluesinridgeregression AT ulfgeirindahl anewformulaforfastercomputationofthekfoldcrossvalidationandgoodregularisationparametervaluesinridgeregression AT kristianhovdeliland newformulaforfastercomputationofthekfoldcrossvalidationandgoodregularisationparametervaluesinridgeregression AT joakimskogholt newformulaforfastercomputationofthekfoldcrossvalidationandgoodregularisationparametervaluesinridgeregression AT ulfgeirindahl newformulaforfastercomputationofthekfoldcrossvalidationandgoodregularisationparametervaluesinridgeregression |