Greedy knot selection algorithm for restricted cubic spline regression

Non-linear regression modeling is common in epidemiology for prediction purposes or estimating relationships between predictor and response variables. Restricted cubic spline (RCS) regression is one such method, for example, highly relevant to Cox proportional hazard regression model analysis. RCS r...

Full description

Bibliographic Details
Main Authors: Jo Inge Arnes, Alexander Hapfelmeier, Alexander Horsch, Tonje Braaten
Format: Article
Language:English
Published: Frontiers Media S.A. 2023-12-01
Series:Frontiers in Epidemiology
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fepid.2023.1283705/full
_version_ 1797387077212438528
author Jo Inge Arnes
Alexander Hapfelmeier
Alexander Horsch
Tonje Braaten
author_facet Jo Inge Arnes
Alexander Hapfelmeier
Alexander Horsch
Tonje Braaten
author_sort Jo Inge Arnes
collection DOAJ
description Non-linear regression modeling is common in epidemiology for prediction purposes or estimating relationships between predictor and response variables. Restricted cubic spline (RCS) regression is one such method, for example, highly relevant to Cox proportional hazard regression model analysis. RCS regression uses third-order polynomials joined at knot points to model non-linear relationships. The standard approach is to place knots by a regular sequence of quantiles between the outer boundaries. A regression curve can easily be fitted to the sample using a relatively high number of knots. The problem is then overfitting, where a regression model has a good fit to the given sample but does not generalize well to other samples. A low knot count is thus preferred. However, the standard knot selection process can lead to underperformance in the sparser regions of the predictor variable, especially when using a low number of knots. It can also lead to overfitting in the denser regions. We present a simple greedy search algorithm using a backward method for knot selection that shows reduced prediction error and Bayesian information criterion scores compared to the standard knot selection process in simulation experiments. We have implemented the algorithm as part of an open-source R-package, knutar.
first_indexed 2024-03-08T22:19:35Z
format Article
id doaj.art-b0f8f426661f4162ab9796e2ab17c41d
institution Directory Open Access Journal
issn 2674-1199
language English
last_indexed 2024-03-08T22:19:35Z
publishDate 2023-12-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Epidemiology
spelling doaj.art-b0f8f426661f4162ab9796e2ab17c41d2023-12-18T15:57:18ZengFrontiers Media S.A.Frontiers in Epidemiology2674-11992023-12-01310.3389/fepid.2023.12837051283705Greedy knot selection algorithm for restricted cubic spline regressionJo Inge Arnes0Alexander Hapfelmeier1Alexander Horsch2Tonje Braaten3Department of Computer Science, Faculty of Science and Technology, UiT The Arctic University of Norway, Tromsø, NorwayInstitute of AI and Informatics in Medicine, TUM School of Medicine, Technical University of Munich, Munich, GermanyDepartment of Computer Science, Faculty of Science and Technology, UiT The Arctic University of Norway, Tromsø, NorwayDepartment of Community Medicine, Faculty of Health Sciences, UiT The Arctic University of Norway, Tromsø, NorwayNon-linear regression modeling is common in epidemiology for prediction purposes or estimating relationships between predictor and response variables. Restricted cubic spline (RCS) regression is one such method, for example, highly relevant to Cox proportional hazard regression model analysis. RCS regression uses third-order polynomials joined at knot points to model non-linear relationships. The standard approach is to place knots by a regular sequence of quantiles between the outer boundaries. A regression curve can easily be fitted to the sample using a relatively high number of knots. The problem is then overfitting, where a regression model has a good fit to the given sample but does not generalize well to other samples. A low knot count is thus preferred. However, the standard knot selection process can lead to underperformance in the sparser regions of the predictor variable, especially when using a low number of knots. It can also lead to overfitting in the denser regions. We present a simple greedy search algorithm using a backward method for knot selection that shows reduced prediction error and Bayesian information criterion scores compared to the standard knot selection process in simulation experiments. We have implemented the algorithm as part of an open-source R-package, knutar.https://www.frontiersin.org/articles/10.3389/fepid.2023.1283705/fullmodel selectionnon-linear regressionpredictionrestricted cubic splinesalgorithm
spellingShingle Jo Inge Arnes
Alexander Hapfelmeier
Alexander Horsch
Tonje Braaten
Greedy knot selection algorithm for restricted cubic spline regression
Frontiers in Epidemiology
model selection
non-linear regression
prediction
restricted cubic splines
algorithm
title Greedy knot selection algorithm for restricted cubic spline regression
title_full Greedy knot selection algorithm for restricted cubic spline regression
title_fullStr Greedy knot selection algorithm for restricted cubic spline regression
title_full_unstemmed Greedy knot selection algorithm for restricted cubic spline regression
title_short Greedy knot selection algorithm for restricted cubic spline regression
title_sort greedy knot selection algorithm for restricted cubic spline regression
topic model selection
non-linear regression
prediction
restricted cubic splines
algorithm
url https://www.frontiersin.org/articles/10.3389/fepid.2023.1283705/full
work_keys_str_mv AT joingearnes greedyknotselectionalgorithmforrestrictedcubicsplineregression
AT alexanderhapfelmeier greedyknotselectionalgorithmforrestrictedcubicsplineregression
AT alexanderhorsch greedyknotselectionalgorithmforrestrictedcubicsplineregression
AT tonjebraaten greedyknotselectionalgorithmforrestrictedcubicsplineregression