Developing clinical prediction models when adhering to minimum sample size recommendations: the importance of quantifying bootstrap variability in tuning parameters and predictive performance

Recent minimum sample size formula (Riley et al.) for developing clinical prediction models help ensure that development datasets are of sufficient size to minimise overfitting. While these criteria are known to avoid excessive overfitting on average, the extent of variability in overfitting at reco...

Full description

Bibliographic Details
Main Authors:	Martin, GP, Riley, RD, Collins, GS, Sperrin, M
Format:	Journal article
Language:	English
Published:	SAGE Publications 2021

_version_	1797065305396084736
author	Martin, GP Riley, RD Collins, GS Sperrin, M
author_facet	Martin, GP Riley, RD Collins, GS Sperrin, M
author_sort	Martin, GP
collection	OXFORD
description	Recent minimum sample size formula (Riley et al.) for developing clinical prediction models help ensure that development datasets are of sufficient size to minimise overfitting. While these criteria are known to avoid excessive overfitting on average, the extent of variability in overfitting at recommended sample sizes is unknown. We investigated this through a simulation study and empirical example to develop logistic regression clinical prediction models using unpenalised maximum likelihood estimation, and various post-estimation shrinkage or penalisation methods. While the mean calibration slope was close to the ideal value of one for all methods, penalisation further reduced the level of overfitting, on average, compared to unpenalised methods. This came at the cost of higher variability in predictive performance for penalisation methods in external data. We recommend that penalisation methods are used in data that meet, or surpass, minimum sample size requirements to further mitigate overfitting, and that the variability in predictive performance and any tuning parameters should always be examined as part of the model development process, since this provides additional information over average (optimism-adjusted) performance alone. Lower variability would give reassurance that the developed clinical prediction model will perform well in new individuals from the same population as was used for model development.
first_indexed	2024-03-06T21:26:45Z
format	Journal article
id	oxford-uuid:435cab35-5db9-4d3e-81ea-a8a5bd1a1b9b
institution	University of Oxford
language	English
last_indexed	2024-03-06T21:26:45Z
publishDate	2021
publisher	SAGE Publications
record_format	dspace
spelling	oxford-uuid:435cab35-5db9-4d3e-81ea-a8a5bd1a1b9b2022-03-26T14:54:54ZDeveloping clinical prediction models when adhering to minimum sample size recommendations: the importance of quantifying bootstrap variability in tuning parameters and predictive performanceJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:435cab35-5db9-4d3e-81ea-a8a5bd1a1b9bEnglishSymplectic ElementsSAGE Publications2021Martin, GPRiley, RDCollins, GSSperrin, MRecent minimum sample size formula (Riley et al.) for developing clinical prediction models help ensure that development datasets are of sufficient size to minimise overfitting. While these criteria are known to avoid excessive overfitting on average, the extent of variability in overfitting at recommended sample sizes is unknown. We investigated this through a simulation study and empirical example to develop logistic regression clinical prediction models using unpenalised maximum likelihood estimation, and various post-estimation shrinkage or penalisation methods. While the mean calibration slope was close to the ideal value of one for all methods, penalisation further reduced the level of overfitting, on average, compared to unpenalised methods. This came at the cost of higher variability in predictive performance for penalisation methods in external data. We recommend that penalisation methods are used in data that meet, or surpass, minimum sample size requirements to further mitigate overfitting, and that the variability in predictive performance and any tuning parameters should always be examined as part of the model development process, since this provides additional information over average (optimism-adjusted) performance alone. Lower variability would give reassurance that the developed clinical prediction model will perform well in new individuals from the same population as was used for model development.
spellingShingle	Martin, GP Riley, RD Collins, GS Sperrin, M Developing clinical prediction models when adhering to minimum sample size recommendations: the importance of quantifying bootstrap variability in tuning parameters and predictive performance
title	Developing clinical prediction models when adhering to minimum sample size recommendations: the importance of quantifying bootstrap variability in tuning parameters and predictive performance
title_full	Developing clinical prediction models when adhering to minimum sample size recommendations: the importance of quantifying bootstrap variability in tuning parameters and predictive performance
title_fullStr	Developing clinical prediction models when adhering to minimum sample size recommendations: the importance of quantifying bootstrap variability in tuning parameters and predictive performance
title_full_unstemmed	Developing clinical prediction models when adhering to minimum sample size recommendations: the importance of quantifying bootstrap variability in tuning parameters and predictive performance
title_short	Developing clinical prediction models when adhering to minimum sample size recommendations: the importance of quantifying bootstrap variability in tuning parameters and predictive performance
title_sort	developing clinical prediction models when adhering to minimum sample size recommendations the importance of quantifying bootstrap variability in tuning parameters and predictive performance
work_keys_str_mv	AT martingp developingclinicalpredictionmodelswhenadheringtominimumsamplesizerecommendationstheimportanceofquantifyingbootstrapvariabilityintuningparametersandpredictiveperformance AT rileyrd developingclinicalpredictionmodelswhenadheringtominimumsamplesizerecommendationstheimportanceofquantifyingbootstrapvariabilityintuningparametersandpredictiveperformance AT collinsgs developingclinicalpredictionmodelswhenadheringtominimumsamplesizerecommendationstheimportanceofquantifyingbootstrapvariabilityintuningparametersandpredictiveperformance AT sperrinm developingclinicalpredictionmodelswhenadheringtominimumsamplesizerecommendationstheimportanceofquantifyingbootstrapvariabilityintuningparametersandpredictiveperformance

Developing clinical prediction models when adhering to minimum sample size recommendations: the importance of quantifying bootstrap variability in tuning parameters and predictive performance

Similar Items