Effects of influential points and sample size on the selection and replicability of multivariable fractional polynomial models
Abstract Background The multivariable fractional polynomial (MFP) approach combines variable selection using backward elimination with a function selection procedure (FSP) for fractional polynomial (FP) functions. It is a relatively simple approach which can be easily understood without advanced tra...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2023-04-01
|
Series: | Diagnostic and Prognostic Research |
Subjects: | |
Online Access: | https://doi.org/10.1186/s41512-023-00145-1 |
_version_ | 1797840801786494976 |
---|---|
author | Willi Sauerbrei Edwin Kipruto James Balmford |
author_facet | Willi Sauerbrei Edwin Kipruto James Balmford |
author_sort | Willi Sauerbrei |
collection | DOAJ |
description | Abstract Background The multivariable fractional polynomial (MFP) approach combines variable selection using backward elimination with a function selection procedure (FSP) for fractional polynomial (FP) functions. It is a relatively simple approach which can be easily understood without advanced training in statistical modeling. For continuous variables, a closed test procedure is used to decide between no effect, linear, FP1, or FP2 functions. Influential points (IPs) and small sample sizes can both have a strong impact on a selected function and MFP model. Methods We used simulated data with six continuous and four categorical predictors to illustrate approaches which can help to identify IPs with an influence on function selection and the MFP model. Approaches use leave-one or two-out and two related techniques for a multivariable assessment. In eight subsamples, we also investigated the effects of sample size and model replicability, the latter by using three non-overlapping subsamples with the same sample size. For better illustration, a structured profile was used to provide an overview of all analyses conducted. Results The results showed that one or more IPs can drive the functions and models selected. In addition, with a small sample size, MFP was not able to detect some non-linear functions and the selected model differed substantially from the true underlying model. However, when the sample size was relatively large and regression diagnostics were carefully conducted, MFP selected functions or models that were similar to the underlying true model. Conclusions For smaller sample size, IPs and low power are important reasons that the MFP approach may not be able to identify underlying functional relationships for continuous variables and selected models might differ substantially from the true model. However, for larger sample sizes, a carefully conducted MFP analysis is often a suitable way to select a multivariable regression model which includes continuous variables. In such a case, MFP can be the preferred approach to derive a multivariable descriptive model. |
first_indexed | 2024-04-09T16:20:36Z |
format | Article |
id | doaj.art-95dfafb53054422485a7efb4799a79fe |
institution | Directory Open Access Journal |
issn | 2397-7523 |
language | English |
last_indexed | 2024-04-09T16:20:36Z |
publishDate | 2023-04-01 |
publisher | BMC |
record_format | Article |
series | Diagnostic and Prognostic Research |
spelling | doaj.art-95dfafb53054422485a7efb4799a79fe2023-04-23T11:31:25ZengBMCDiagnostic and Prognostic Research2397-75232023-04-017111710.1186/s41512-023-00145-1Effects of influential points and sample size on the selection and replicability of multivariable fractional polynomial modelsWilli Sauerbrei0Edwin Kipruto1James Balmford2Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of FreiburgInstitute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of FreiburgInstitute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of FreiburgAbstract Background The multivariable fractional polynomial (MFP) approach combines variable selection using backward elimination with a function selection procedure (FSP) for fractional polynomial (FP) functions. It is a relatively simple approach which can be easily understood without advanced training in statistical modeling. For continuous variables, a closed test procedure is used to decide between no effect, linear, FP1, or FP2 functions. Influential points (IPs) and small sample sizes can both have a strong impact on a selected function and MFP model. Methods We used simulated data with six continuous and four categorical predictors to illustrate approaches which can help to identify IPs with an influence on function selection and the MFP model. Approaches use leave-one or two-out and two related techniques for a multivariable assessment. In eight subsamples, we also investigated the effects of sample size and model replicability, the latter by using three non-overlapping subsamples with the same sample size. For better illustration, a structured profile was used to provide an overview of all analyses conducted. Results The results showed that one or more IPs can drive the functions and models selected. In addition, with a small sample size, MFP was not able to detect some non-linear functions and the selected model differed substantially from the true underlying model. However, when the sample size was relatively large and regression diagnostics were carefully conducted, MFP selected functions or models that were similar to the underlying true model. Conclusions For smaller sample size, IPs and low power are important reasons that the MFP approach may not be able to identify underlying functional relationships for continuous variables and selected models might differ substantially from the true model. However, for larger sample sizes, a carefully conducted MFP analysis is often a suitable way to select a multivariable regression model which includes continuous variables. In such a case, MFP can be the preferred approach to derive a multivariable descriptive model.https://doi.org/10.1186/s41512-023-00145-1Continuous variableFractional polynomialInfluential pointModel buildingSample sizeSimulated data |
spellingShingle | Willi Sauerbrei Edwin Kipruto James Balmford Effects of influential points and sample size on the selection and replicability of multivariable fractional polynomial models Diagnostic and Prognostic Research Continuous variable Fractional polynomial Influential point Model building Sample size Simulated data |
title | Effects of influential points and sample size on the selection and replicability of multivariable fractional polynomial models |
title_full | Effects of influential points and sample size on the selection and replicability of multivariable fractional polynomial models |
title_fullStr | Effects of influential points and sample size on the selection and replicability of multivariable fractional polynomial models |
title_full_unstemmed | Effects of influential points and sample size on the selection and replicability of multivariable fractional polynomial models |
title_short | Effects of influential points and sample size on the selection and replicability of multivariable fractional polynomial models |
title_sort | effects of influential points and sample size on the selection and replicability of multivariable fractional polynomial models |
topic | Continuous variable Fractional polynomial Influential point Model building Sample size Simulated data |
url | https://doi.org/10.1186/s41512-023-00145-1 |
work_keys_str_mv | AT willisauerbrei effectsofinfluentialpointsandsamplesizeontheselectionandreplicabilityofmultivariablefractionalpolynomialmodels AT edwinkipruto effectsofinfluentialpointsandsamplesizeontheselectionandreplicabilityofmultivariablefractionalpolynomialmodels AT jamesbalmford effectsofinfluentialpointsandsamplesizeontheselectionandreplicabilityofmultivariablefractionalpolynomialmodels |