Adjustment for baseline characteristics in randomized trials using logistic regression: sample-based model versus true model

Abstract Background Adjustment for baseline prognostic factors in randomized clinical trials is usually performed by means of sample-based regression models. Sample-based models may be incorrect due to overfitting. To assess whether overfitting is a problem in practice, we used simulated data to exa...

Full description

Bibliographic Details
Main Authors: Thomas Perneger, Christophe Combescure, Antoine Poncet
Format: Article
Language:English
Published: BMC 2023-02-01
Series:Trials
Subjects:
Online Access:https://doi.org/10.1186/s13063-022-07053-7
_version_ 1797863629579616256
author Thomas Perneger
Christophe Combescure
Antoine Poncet
author_facet Thomas Perneger
Christophe Combescure
Antoine Poncet
author_sort Thomas Perneger
collection DOAJ
description Abstract Background Adjustment for baseline prognostic factors in randomized clinical trials is usually performed by means of sample-based regression models. Sample-based models may be incorrect due to overfitting. To assess whether overfitting is a problem in practice, we used simulated data to examine the performance of the sample-based model in comparison to a “true” adjustment model, in terms of estimation of the treatment effect. Methods We conducted a simulation study using samples drawn from a “population” in which both the treatment effect and the effect of the potential confounder were specified. The outcome variable was binary. Using logistic regression, we compared three estimates of the treatment effect in each situation: unadjusted, adjusted for the confounder using the sample, adjusted for the confounder using the true effect. Experimental factors were sample size (from 2 × 50 to 2 × 1000), treatment effect (logit of 0, 0.5, or 1.0), confounder type (continuous or binary), and confounder effect (logit of 0, − 0.5, or − 1.0). The assessment criteria for the estimated treatment effect were bias, variance, precision (proportion of estimates within 0.1 logit units), type 1 error, and power. Results Sample-based adjustment models yielded more biased estimates of the treatment effect than adjustment models that used the true confounder effect but had similar variance, accuracy, power, and type 1 error rates. The simulation also confirmed the conservative bias of unadjusted analyses due to the non-collapsibility of the odds ratio, the smaller variance of unadjusted estimates, and the bias of the odds ratio away from the null hypothesis in small datasets. Conclusions Sample-based adjustment yields similar results to exact adjustment in estimating the treatment effect. Sample-based adjustment is preferable to no adjustment.
first_indexed 2024-04-09T22:39:42Z
format Article
id doaj.art-c16110b37a8d4372b10ba6143827999f
institution Directory Open Access Journal
issn 1745-6215
language English
last_indexed 2024-04-09T22:39:42Z
publishDate 2023-02-01
publisher BMC
record_format Article
series Trials
spelling doaj.art-c16110b37a8d4372b10ba6143827999f2023-03-22T12:17:56ZengBMCTrials1745-62152023-02-012411910.1186/s13063-022-07053-7Adjustment for baseline characteristics in randomized trials using logistic regression: sample-based model versus true modelThomas Perneger0Christophe Combescure1Antoine Poncet2Division of Clinical Epidemiology, University of Geneva and Geneva University HospitalsDivision of Clinical Epidemiology, University of Geneva and Geneva University HospitalsDivision of Clinical Epidemiology, University of Geneva and Geneva University HospitalsAbstract Background Adjustment for baseline prognostic factors in randomized clinical trials is usually performed by means of sample-based regression models. Sample-based models may be incorrect due to overfitting. To assess whether overfitting is a problem in practice, we used simulated data to examine the performance of the sample-based model in comparison to a “true” adjustment model, in terms of estimation of the treatment effect. Methods We conducted a simulation study using samples drawn from a “population” in which both the treatment effect and the effect of the potential confounder were specified. The outcome variable was binary. Using logistic regression, we compared three estimates of the treatment effect in each situation: unadjusted, adjusted for the confounder using the sample, adjusted for the confounder using the true effect. Experimental factors were sample size (from 2 × 50 to 2 × 1000), treatment effect (logit of 0, 0.5, or 1.0), confounder type (continuous or binary), and confounder effect (logit of 0, − 0.5, or − 1.0). The assessment criteria for the estimated treatment effect were bias, variance, precision (proportion of estimates within 0.1 logit units), type 1 error, and power. Results Sample-based adjustment models yielded more biased estimates of the treatment effect than adjustment models that used the true confounder effect but had similar variance, accuracy, power, and type 1 error rates. The simulation also confirmed the conservative bias of unadjusted analyses due to the non-collapsibility of the odds ratio, the smaller variance of unadjusted estimates, and the bias of the odds ratio away from the null hypothesis in small datasets. Conclusions Sample-based adjustment yields similar results to exact adjustment in estimating the treatment effect. Sample-based adjustment is preferable to no adjustment.https://doi.org/10.1186/s13063-022-07053-7Randomized clinical trialsBaseline imbalanceStatistical adjustmentOver-fittingSimulation study
spellingShingle Thomas Perneger
Christophe Combescure
Antoine Poncet
Adjustment for baseline characteristics in randomized trials using logistic regression: sample-based model versus true model
Trials
Randomized clinical trials
Baseline imbalance
Statistical adjustment
Over-fitting
Simulation study
title Adjustment for baseline characteristics in randomized trials using logistic regression: sample-based model versus true model
title_full Adjustment for baseline characteristics in randomized trials using logistic regression: sample-based model versus true model
title_fullStr Adjustment for baseline characteristics in randomized trials using logistic regression: sample-based model versus true model
title_full_unstemmed Adjustment for baseline characteristics in randomized trials using logistic regression: sample-based model versus true model
title_short Adjustment for baseline characteristics in randomized trials using logistic regression: sample-based model versus true model
title_sort adjustment for baseline characteristics in randomized trials using logistic regression sample based model versus true model
topic Randomized clinical trials
Baseline imbalance
Statistical adjustment
Over-fitting
Simulation study
url https://doi.org/10.1186/s13063-022-07053-7
work_keys_str_mv AT thomasperneger adjustmentforbaselinecharacteristicsinrandomizedtrialsusinglogisticregressionsamplebasedmodelversustruemodel
AT christophecombescure adjustmentforbaselinecharacteristicsinrandomizedtrialsusinglogisticregressionsamplebasedmodelversustruemodel
AT antoineponcet adjustmentforbaselinecharacteristicsinrandomizedtrialsusinglogisticregressionsamplebasedmodelversustruemodel