Sample size requirements are not being considered in studies developing prediction models for binary outcomes: a systematic review

Abstract Background Having an appropriate sample size is important when developing a clinical prediction model. We aimed to review how sample size is considered in studies developing a prediction model for a binary outcome. Methods We searched PubMed for studies published between 01/07/2020 and 30/0...

Full description

Bibliographic Details
Main Authors:	Paula Dhiman, Jie Ma, Cathy Qi, Garrett Bullock, Jamie C Sergeant, Richard D Riley, Gary S Collins
Format:	Article
Language:	English
Published:	BMC 2023-08-01
Series:	BMC Medical Research Methodology
Subjects:	Sample size Methodology Prediction model
Online Access:	https://doi.org/10.1186/s12874-023-02008-1

_version_	1797558860505939968
author	Paula Dhiman Jie Ma Cathy Qi Garrett Bullock Jamie C Sergeant Richard D Riley Gary S Collins
author_facet	Paula Dhiman Jie Ma Cathy Qi Garrett Bullock Jamie C Sergeant Richard D Riley Gary S Collins
author_sort	Paula Dhiman
collection	DOAJ
description	Abstract Background Having an appropriate sample size is important when developing a clinical prediction model. We aimed to review how sample size is considered in studies developing a prediction model for a binary outcome. Methods We searched PubMed for studies published between 01/07/2020 and 30/07/2020 and reviewed the sample size calculations used to develop the prediction models. Using the available information, we calculated the minimum sample size that would be needed to estimate overall risk and minimise overfitting in each study and summarised the difference between the calculated and used sample size. Results A total of 119 studies were included, of which nine studies provided sample size justification (8%). The recommended minimum sample size could be calculated for 94 studies: 73% (95% CI: 63–82%) used sample sizes lower than required to estimate overall risk and minimise overfitting including 26% studies that used sample sizes lower than required to estimate overall risk only. A similar number of studies did not meet the ≥ 10EPV criteria (75%, 95% CI: 66–84%). The median deficit of the number of events used to develop a model was 75 [IQR: 234 lower to 7 higher]) which reduced to 63 if the total available data (before any data splitting) was used [IQR:225 lower to 7 higher]. Studies that met the minimum required sample size had a median c-statistic of 0.84 (IQR:0.80 to 0.9) and studies where the minimum sample size was not met had a median c-statistic of 0.83 (IQR: 0.75 to 0.9). Studies that met the ≥ 10 EPP criteria had a median c-statistic of 0.80 (IQR: 0.73 to 0.84). Conclusions Prediction models are often developed with no sample size calculation, as a consequence many are too small to precisely estimate the overall risk. We encourage researchers to justify, perform and report sample size calculations when developing a prediction model.
first_indexed	2024-03-10T17:36:22Z
format	Article
id	doaj.art-3a6129ed94174161ad89c5c14fff47aa
institution	Directory Open Access Journal
issn	1471-2288
language	English
last_indexed	2024-03-10T17:36:22Z
publishDate	2023-08-01
publisher	BMC
record_format	Article
series	BMC Medical Research Methodology
spelling	doaj.art-3a6129ed94174161ad89c5c14fff47aa2023-11-20T09:49:28ZengBMCBMC Medical Research Methodology1471-22882023-08-0123111110.1186/s12874-023-02008-1Sample size requirements are not being considered in studies developing prediction models for binary outcomes: a systematic reviewPaula Dhiman0Jie Ma1Cathy Qi2Garrett Bullock3Jamie C Sergeant4Richard D Riley5Gary S Collins6Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of OxfordCentre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of OxfordPopulation Data Science, Faculty of Medicine, Health and Life Science, Swansea University Medical School, Swansea UniversityDepartment of Orthopaedic Surgery, Wake Forest School of MedicineCentre for Biostatistics, University of Manchester, Manchester Academic Health Science CentreInstitute of Applied Health Research, College of Medical and Dental Sciences, University of BirminghamCentre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of OxfordAbstract Background Having an appropriate sample size is important when developing a clinical prediction model. We aimed to review how sample size is considered in studies developing a prediction model for a binary outcome. Methods We searched PubMed for studies published between 01/07/2020 and 30/07/2020 and reviewed the sample size calculations used to develop the prediction models. Using the available information, we calculated the minimum sample size that would be needed to estimate overall risk and minimise overfitting in each study and summarised the difference between the calculated and used sample size. Results A total of 119 studies were included, of which nine studies provided sample size justification (8%). The recommended minimum sample size could be calculated for 94 studies: 73% (95% CI: 63–82%) used sample sizes lower than required to estimate overall risk and minimise overfitting including 26% studies that used sample sizes lower than required to estimate overall risk only. A similar number of studies did not meet the ≥ 10EPV criteria (75%, 95% CI: 66–84%). The median deficit of the number of events used to develop a model was 75 [IQR: 234 lower to 7 higher]) which reduced to 63 if the total available data (before any data splitting) was used [IQR:225 lower to 7 higher]. Studies that met the minimum required sample size had a median c-statistic of 0.84 (IQR:0.80 to 0.9) and studies where the minimum sample size was not met had a median c-statistic of 0.83 (IQR: 0.75 to 0.9). Studies that met the ≥ 10 EPP criteria had a median c-statistic of 0.80 (IQR: 0.73 to 0.84). Conclusions Prediction models are often developed with no sample size calculation, as a consequence many are too small to precisely estimate the overall risk. We encourage researchers to justify, perform and report sample size calculations when developing a prediction model.https://doi.org/10.1186/s12874-023-02008-1Sample sizeMethodologyPrediction model
spellingShingle	Paula Dhiman Jie Ma Cathy Qi Garrett Bullock Jamie C Sergeant Richard D Riley Gary S Collins Sample size requirements are not being considered in studies developing prediction models for binary outcomes: a systematic review BMC Medical Research Methodology Sample size Methodology Prediction model
title	Sample size requirements are not being considered in studies developing prediction models for binary outcomes: a systematic review
title_full	Sample size requirements are not being considered in studies developing prediction models for binary outcomes: a systematic review
title_fullStr	Sample size requirements are not being considered in studies developing prediction models for binary outcomes: a systematic review
title_full_unstemmed	Sample size requirements are not being considered in studies developing prediction models for binary outcomes: a systematic review
title_short	Sample size requirements are not being considered in studies developing prediction models for binary outcomes: a systematic review
title_sort	sample size requirements are not being considered in studies developing prediction models for binary outcomes a systematic review
topic	Sample size Methodology Prediction model
url	https://doi.org/10.1186/s12874-023-02008-1
work_keys_str_mv	AT pauladhiman samplesizerequirementsarenotbeingconsideredinstudiesdevelopingpredictionmodelsforbinaryoutcomesasystematicreview AT jiema samplesizerequirementsarenotbeingconsideredinstudiesdevelopingpredictionmodelsforbinaryoutcomesasystematicreview AT cathyqi samplesizerequirementsarenotbeingconsideredinstudiesdevelopingpredictionmodelsforbinaryoutcomesasystematicreview AT garrettbullock samplesizerequirementsarenotbeingconsideredinstudiesdevelopingpredictionmodelsforbinaryoutcomesasystematicreview AT jamiecsergeant samplesizerequirementsarenotbeingconsideredinstudiesdevelopingpredictionmodelsforbinaryoutcomesasystematicreview AT richarddriley samplesizerequirementsarenotbeingconsideredinstudiesdevelopingpredictionmodelsforbinaryoutcomesasystematicreview AT garyscollins samplesizerequirementsarenotbeingconsideredinstudiesdevelopingpredictionmodelsforbinaryoutcomesasystematicreview

Sample size requirements are not being considered in studies developing prediction models for binary outcomes: a systematic review

Similar Items