A New Formula for Faster Computation of the K-Fold Cross-Validation and Good Regularisation Parameter Values in Ridge Regression

In the present paper, we prove a new theorem, resulting in an update formula for linear regression model residuals calculating the exact k-fold cross-validation residuals for any choice of cross-validation strategy without model refitting. The required matrix inversions are limited by the cross-vali...

Full description

Bibliographic Details
Main Authors: Kristian Hovde Liland, Joakim Skogholt, Ulf Geir Indahl
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10411898/
_version_ 1797323937995030528
author Kristian Hovde Liland
Joakim Skogholt
Ulf Geir Indahl
author_facet Kristian Hovde Liland
Joakim Skogholt
Ulf Geir Indahl
author_sort Kristian Hovde Liland
collection DOAJ
description In the present paper, we prove a new theorem, resulting in an update formula for linear regression model residuals calculating the exact k-fold cross-validation residuals for any choice of cross-validation strategy without model refitting. The required matrix inversions are limited by the cross-validation segment sizes and can be executed with high efficiency in parallel. The well-known formula for leave-one-out cross-validation follows as a special case of the theorem. In situations where the cross-validation segments consist of small groups of repeated measurements, we suggest a heuristic strategy for fast serial approximations of the cross-validated residuals and associated Predicted Residual Sum of Squares (<inline-formula> <tex-math notation="LaTeX">$PRESS$ </tex-math></inline-formula>) statistic. We also suggest strategies for efficient estimation of the minimum <inline-formula> <tex-math notation="LaTeX">$PRESS$ </tex-math></inline-formula> value and full <inline-formula> <tex-math notation="LaTeX">$PRESS$ </tex-math></inline-formula> function over a selected interval of regularisation values. The computational effectiveness of the parameter selection for Ridge- and Tikhonov regression modelling resulting from our theoretical findings and heuristic arguments is demonstrated in several applications with real and highly multivariate datasets.
first_indexed 2024-03-08T05:36:12Z
format Article
id doaj.art-b6b3ec69fc6146ecae5df4207c269904
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-03-08T05:36:12Z
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-b6b3ec69fc6146ecae5df4207c2699042024-02-06T00:00:50ZengIEEEIEEE Access2169-35362024-01-0112173491736810.1109/ACCESS.2024.335709710411898A New Formula for Faster Computation of the K-Fold Cross-Validation and Good Regularisation Parameter Values in Ridge RegressionKristian Hovde Liland0https://orcid.org/0000-0001-6468-9423Joakim Skogholt1https://orcid.org/0000-0001-8511-993XUlf Geir Indahl2https://orcid.org/0000-0002-3236-463XFaculty of Science and Technology, Norwegian University of Life Sciences, &#x00C5;s, NorwayFaculty of Science and Technology, Norwegian University of Life Sciences, &#x00C5;s, NorwayFaculty of Science and Technology, Norwegian University of Life Sciences, &#x00C5;s, NorwayIn the present paper, we prove a new theorem, resulting in an update formula for linear regression model residuals calculating the exact k-fold cross-validation residuals for any choice of cross-validation strategy without model refitting. The required matrix inversions are limited by the cross-validation segment sizes and can be executed with high efficiency in parallel. The well-known formula for leave-one-out cross-validation follows as a special case of the theorem. In situations where the cross-validation segments consist of small groups of repeated measurements, we suggest a heuristic strategy for fast serial approximations of the cross-validated residuals and associated Predicted Residual Sum of Squares (<inline-formula> <tex-math notation="LaTeX">$PRESS$ </tex-math></inline-formula>) statistic. We also suggest strategies for efficient estimation of the minimum <inline-formula> <tex-math notation="LaTeX">$PRESS$ </tex-math></inline-formula> value and full <inline-formula> <tex-math notation="LaTeX">$PRESS$ </tex-math></inline-formula> function over a selected interval of regularisation values. The computational effectiveness of the parameter selection for Ridge- and Tikhonov regression modelling resulting from our theoretical findings and heuristic arguments is demonstrated in several applications with real and highly multivariate datasets.https://ieeexplore.ieee.org/document/10411898/Cross-validationGCVPRESS statisticridge regressionSVDTikhonov regularisation
spellingShingle Kristian Hovde Liland
Joakim Skogholt
Ulf Geir Indahl
A New Formula for Faster Computation of the K-Fold Cross-Validation and Good Regularisation Parameter Values in Ridge Regression
IEEE Access
Cross-validation
GCV
PRESS statistic
ridge regression
SVD
Tikhonov regularisation
title A New Formula for Faster Computation of the K-Fold Cross-Validation and Good Regularisation Parameter Values in Ridge Regression
title_full A New Formula for Faster Computation of the K-Fold Cross-Validation and Good Regularisation Parameter Values in Ridge Regression
title_fullStr A New Formula for Faster Computation of the K-Fold Cross-Validation and Good Regularisation Parameter Values in Ridge Regression
title_full_unstemmed A New Formula for Faster Computation of the K-Fold Cross-Validation and Good Regularisation Parameter Values in Ridge Regression
title_short A New Formula for Faster Computation of the K-Fold Cross-Validation and Good Regularisation Parameter Values in Ridge Regression
title_sort new formula for faster computation of the k fold cross validation and good regularisation parameter values in ridge regression
topic Cross-validation
GCV
PRESS statistic
ridge regression
SVD
Tikhonov regularisation
url https://ieeexplore.ieee.org/document/10411898/
work_keys_str_mv AT kristianhovdeliland anewformulaforfastercomputationofthekfoldcrossvalidationandgoodregularisationparametervaluesinridgeregression
AT joakimskogholt anewformulaforfastercomputationofthekfoldcrossvalidationandgoodregularisationparametervaluesinridgeregression
AT ulfgeirindahl anewformulaforfastercomputationofthekfoldcrossvalidationandgoodregularisationparametervaluesinridgeregression
AT kristianhovdeliland newformulaforfastercomputationofthekfoldcrossvalidationandgoodregularisationparametervaluesinridgeregression
AT joakimskogholt newformulaforfastercomputationofthekfoldcrossvalidationandgoodregularisationparametervaluesinridgeregression
AT ulfgeirindahl newformulaforfastercomputationofthekfoldcrossvalidationandgoodregularisationparametervaluesinridgeregression