Effects of influential points and sample size on the selection and replicability of multivariable fractional polynomial models

Abstract Background The multivariable fractional polynomial (MFP) approach combines variable selection using backward elimination with a function selection procedure (FSP) for fractional polynomial (FP) functions. It is a relatively simple approach which can be easily understood without advanced tra...

Full description

Bibliographic Details
Main Authors: Willi Sauerbrei, Edwin Kipruto, James Balmford
Format: Article
Language:English
Published: BMC 2023-04-01
Series:Diagnostic and Prognostic Research
Subjects:
Online Access:https://doi.org/10.1186/s41512-023-00145-1
_version_ 1797840801786494976
author Willi Sauerbrei
Edwin Kipruto
James Balmford
author_facet Willi Sauerbrei
Edwin Kipruto
James Balmford
author_sort Willi Sauerbrei
collection DOAJ
description Abstract Background The multivariable fractional polynomial (MFP) approach combines variable selection using backward elimination with a function selection procedure (FSP) for fractional polynomial (FP) functions. It is a relatively simple approach which can be easily understood without advanced training in statistical modeling. For continuous variables, a closed test procedure is used to decide between no effect, linear, FP1, or FP2 functions. Influential points (IPs) and small sample sizes can both have a strong impact on a selected function and MFP model. Methods We used simulated data with six continuous and four categorical predictors to illustrate approaches which can help to identify IPs with an influence on function selection and the MFP model. Approaches use leave-one or two-out and two related techniques for a multivariable assessment. In eight subsamples, we also investigated the effects of sample size and model replicability, the latter by using three non-overlapping subsamples with the same sample size. For better illustration, a structured profile was used to provide an overview of all analyses conducted. Results The results showed that one or more IPs can drive the functions and models selected. In addition, with a small sample size, MFP was not able to detect some non-linear functions and the selected model differed substantially from the true underlying model. However, when the sample size was relatively large and regression diagnostics were carefully conducted, MFP selected functions or models that were similar to the underlying true model. Conclusions For smaller sample size, IPs and low power are important reasons that the MFP approach may not be able to identify underlying functional relationships for continuous variables and selected models might differ substantially from the true model. However, for larger sample sizes, a carefully conducted MFP analysis is often a suitable way to select a multivariable regression model which includes continuous variables. In such a case, MFP can be the preferred approach to derive a multivariable descriptive model.
first_indexed 2024-04-09T16:20:36Z
format Article
id doaj.art-95dfafb53054422485a7efb4799a79fe
institution Directory Open Access Journal
issn 2397-7523
language English
last_indexed 2024-04-09T16:20:36Z
publishDate 2023-04-01
publisher BMC
record_format Article
series Diagnostic and Prognostic Research
spelling doaj.art-95dfafb53054422485a7efb4799a79fe2023-04-23T11:31:25ZengBMCDiagnostic and Prognostic Research2397-75232023-04-017111710.1186/s41512-023-00145-1Effects of influential points and sample size on the selection and replicability of multivariable fractional polynomial modelsWilli Sauerbrei0Edwin Kipruto1James Balmford2Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of FreiburgInstitute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of FreiburgInstitute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of FreiburgAbstract Background The multivariable fractional polynomial (MFP) approach combines variable selection using backward elimination with a function selection procedure (FSP) for fractional polynomial (FP) functions. It is a relatively simple approach which can be easily understood without advanced training in statistical modeling. For continuous variables, a closed test procedure is used to decide between no effect, linear, FP1, or FP2 functions. Influential points (IPs) and small sample sizes can both have a strong impact on a selected function and MFP model. Methods We used simulated data with six continuous and four categorical predictors to illustrate approaches which can help to identify IPs with an influence on function selection and the MFP model. Approaches use leave-one or two-out and two related techniques for a multivariable assessment. In eight subsamples, we also investigated the effects of sample size and model replicability, the latter by using three non-overlapping subsamples with the same sample size. For better illustration, a structured profile was used to provide an overview of all analyses conducted. Results The results showed that one or more IPs can drive the functions and models selected. In addition, with a small sample size, MFP was not able to detect some non-linear functions and the selected model differed substantially from the true underlying model. However, when the sample size was relatively large and regression diagnostics were carefully conducted, MFP selected functions or models that were similar to the underlying true model. Conclusions For smaller sample size, IPs and low power are important reasons that the MFP approach may not be able to identify underlying functional relationships for continuous variables and selected models might differ substantially from the true model. However, for larger sample sizes, a carefully conducted MFP analysis is often a suitable way to select a multivariable regression model which includes continuous variables. In such a case, MFP can be the preferred approach to derive a multivariable descriptive model.https://doi.org/10.1186/s41512-023-00145-1Continuous variableFractional polynomialInfluential pointModel buildingSample sizeSimulated data
spellingShingle Willi Sauerbrei
Edwin Kipruto
James Balmford
Effects of influential points and sample size on the selection and replicability of multivariable fractional polynomial models
Diagnostic and Prognostic Research
Continuous variable
Fractional polynomial
Influential point
Model building
Sample size
Simulated data
title Effects of influential points and sample size on the selection and replicability of multivariable fractional polynomial models
title_full Effects of influential points and sample size on the selection and replicability of multivariable fractional polynomial models
title_fullStr Effects of influential points and sample size on the selection and replicability of multivariable fractional polynomial models
title_full_unstemmed Effects of influential points and sample size on the selection and replicability of multivariable fractional polynomial models
title_short Effects of influential points and sample size on the selection and replicability of multivariable fractional polynomial models
title_sort effects of influential points and sample size on the selection and replicability of multivariable fractional polynomial models
topic Continuous variable
Fractional polynomial
Influential point
Model building
Sample size
Simulated data
url https://doi.org/10.1186/s41512-023-00145-1
work_keys_str_mv AT willisauerbrei effectsofinfluentialpointsandsamplesizeontheselectionandreplicabilityofmultivariablefractionalpolynomialmodels
AT edwinkipruto effectsofinfluentialpointsandsamplesizeontheselectionandreplicabilityofmultivariablefractionalpolynomialmodels
AT jamesbalmford effectsofinfluentialpointsandsamplesizeontheselectionandreplicabilityofmultivariablefractionalpolynomialmodels