Propensity scores as a novel method to guide sample allocation and minimize batch effects during the design of high throughput experiments

Abstract Background We developed a novel approach to minimize batch effects when assigning samples to batches. Our algorithm selects a batch allocation, among all possible ways of assigning samples to batches, that minimizes differences in average propensity score between batches. This strategy was...

Full description

Bibliographic Details
Main Authors:	Patrick M. Carry, Tim Vigers, Lauren A. Vanderlinden, Carson Keeter, Fran Dong, Teresa Buckner, Elizabeth Litkowski, Ivana Yang, Jill M. Norris, Katerina Kechris
Format:	Article
Language:	English
Published:	BMC 2023-03-01
Series:	BMC Bioinformatics
Subjects:	Batch effects Propensity scores Batch effect adjustment ComBat
Online Access:	https://doi.org/10.1186/s12859-023-05202-6

_version_	1797863368769404928
author	Patrick M. Carry Tim Vigers Lauren A. Vanderlinden Carson Keeter Fran Dong Teresa Buckner Elizabeth Litkowski Ivana Yang Jill M. Norris Katerina Kechris
author_facet	Patrick M. Carry Tim Vigers Lauren A. Vanderlinden Carson Keeter Fran Dong Teresa Buckner Elizabeth Litkowski Ivana Yang Jill M. Norris Katerina Kechris
author_sort	Patrick M. Carry
collection	DOAJ
description	Abstract Background We developed a novel approach to minimize batch effects when assigning samples to batches. Our algorithm selects a batch allocation, among all possible ways of assigning samples to batches, that minimizes differences in average propensity score between batches. This strategy was compared to randomization and stratified randomization in a case–control study (30 per group) with a covariate (case vs control, represented as β1, set to be null) and two biologically relevant confounding variables (age, represented as β2, and hemoglobin A1c (HbA1c), represented as β3). Gene expression values were obtained from a publicly available dataset of expression data obtained from pancreas islet cells. Batch effects were simulated as twice the median biological variation across the gene expression dataset and were added to the publicly available dataset to simulate a batch effect condition. Bias was calculated as the absolute difference between observed betas under the batch allocation strategies and the true beta (no batch effects). Bias was also evaluated after adjustment for batch effects using ComBat as well as a linear regression model. In order to understand performance of our optimal allocation strategy under the alternative hypothesis, we also evaluated bias at a single gene associated with both age and HbA1c levels in the ‘true’ dataset (CAPN13 gene). Results Pre-batch correction, under the null hypothesis (β1), maximum absolute bias and root mean square (RMS) of maximum absolute bias, were minimized using the optimal allocation strategy. Under the alternative hypothesis (β2 and β3 for the CAPN13 gene), maximum absolute bias and RMS of maximum absolute bias were also consistently lower using the optimal allocation strategy. ComBat and the regression batch adjustment methods performed well as the bias estimates moved towards the true values in all conditions under both the null and alternative hypotheses. Although the differences between methods were less pronounced following batch correction, estimates of bias (average and RMS) were consistently lower using the optimal allocation strategy under both the null and alternative hypotheses. Conclusions Our algorithm provides an extremely flexible and effective method for assigning samples to batches by exploiting knowledge of covariates prior to sample allocation.
first_indexed	2024-04-09T22:35:33Z
format	Article
id	doaj.art-f5f3161d1212472ba1fb4cc3ce02231b
institution	Directory Open Access Journal
issn	1471-2105
language	English
last_indexed	2024-04-09T22:35:33Z
publishDate	2023-03-01
publisher	BMC
record_format	Article
series	BMC Bioinformatics
spelling	doaj.art-f5f3161d1212472ba1fb4cc3ce02231b2023-03-22T12:33:21ZengBMCBMC Bioinformatics1471-21052023-03-0124111810.1186/s12859-023-05202-6Propensity scores as a novel method to guide sample allocation and minimize batch effects during the design of high throughput experimentsPatrick M. Carry0Tim Vigers1Lauren A. Vanderlinden2Carson Keeter3Fran Dong4Teresa Buckner5Elizabeth Litkowski6Ivana Yang7Jill M. Norris8Katerina Kechris9Colorado Program for Musculoskeletal Research, Department of Orthopedics, University of Colorado Anschutz Medical CampusDepartment of Biostatistics and Informatics, Colorado School of Public HealthDepartment of Epidemiology, Colorado School of Public HealthColorado Program for Musculoskeletal Research, Department of Orthopedics, University of Colorado Anschutz Medical CampusBarbara Davis Center for Diabetes, University of Colorado Anschutz Medical CampusDepartment of Epidemiology, Colorado School of Public HealthDepartment of Epidemiology, Colorado School of Public HealthDepartment of Medicine, University of Colorado Anschutz Medical CampusDepartment of Epidemiology, Colorado School of Public HealthDepartment of Biostatistics and Informatics, Colorado School of Public HealthAbstract Background We developed a novel approach to minimize batch effects when assigning samples to batches. Our algorithm selects a batch allocation, among all possible ways of assigning samples to batches, that minimizes differences in average propensity score between batches. This strategy was compared to randomization and stratified randomization in a case–control study (30 per group) with a covariate (case vs control, represented as β1, set to be null) and two biologically relevant confounding variables (age, represented as β2, and hemoglobin A1c (HbA1c), represented as β3). Gene expression values were obtained from a publicly available dataset of expression data obtained from pancreas islet cells. Batch effects were simulated as twice the median biological variation across the gene expression dataset and were added to the publicly available dataset to simulate a batch effect condition. Bias was calculated as the absolute difference between observed betas under the batch allocation strategies and the true beta (no batch effects). Bias was also evaluated after adjustment for batch effects using ComBat as well as a linear regression model. In order to understand performance of our optimal allocation strategy under the alternative hypothesis, we also evaluated bias at a single gene associated with both age and HbA1c levels in the ‘true’ dataset (CAPN13 gene). Results Pre-batch correction, under the null hypothesis (β1), maximum absolute bias and root mean square (RMS) of maximum absolute bias, were minimized using the optimal allocation strategy. Under the alternative hypothesis (β2 and β3 for the CAPN13 gene), maximum absolute bias and RMS of maximum absolute bias were also consistently lower using the optimal allocation strategy. ComBat and the regression batch adjustment methods performed well as the bias estimates moved towards the true values in all conditions under both the null and alternative hypotheses. Although the differences between methods were less pronounced following batch correction, estimates of bias (average and RMS) were consistently lower using the optimal allocation strategy under both the null and alternative hypotheses. Conclusions Our algorithm provides an extremely flexible and effective method for assigning samples to batches by exploiting knowledge of covariates prior to sample allocation.https://doi.org/10.1186/s12859-023-05202-6Batch effectsPropensity scoresBatch effect adjustmentComBat
spellingShingle	Patrick M. Carry Tim Vigers Lauren A. Vanderlinden Carson Keeter Fran Dong Teresa Buckner Elizabeth Litkowski Ivana Yang Jill M. Norris Katerina Kechris Propensity scores as a novel method to guide sample allocation and minimize batch effects during the design of high throughput experiments BMC Bioinformatics Batch effects Propensity scores Batch effect adjustment ComBat
title	Propensity scores as a novel method to guide sample allocation and minimize batch effects during the design of high throughput experiments
title_full	Propensity scores as a novel method to guide sample allocation and minimize batch effects during the design of high throughput experiments
title_fullStr	Propensity scores as a novel method to guide sample allocation and minimize batch effects during the design of high throughput experiments
title_full_unstemmed	Propensity scores as a novel method to guide sample allocation and minimize batch effects during the design of high throughput experiments
title_short	Propensity scores as a novel method to guide sample allocation and minimize batch effects during the design of high throughput experiments
title_sort	propensity scores as a novel method to guide sample allocation and minimize batch effects during the design of high throughput experiments
topic	Batch effects Propensity scores Batch effect adjustment ComBat
url	https://doi.org/10.1186/s12859-023-05202-6
work_keys_str_mv	AT patrickmcarry propensityscoresasanovelmethodtoguidesampleallocationandminimizebatcheffectsduringthedesignofhighthroughputexperiments AT timvigers propensityscoresasanovelmethodtoguidesampleallocationandminimizebatcheffectsduringthedesignofhighthroughputexperiments AT laurenavanderlinden propensityscoresasanovelmethodtoguidesampleallocationandminimizebatcheffectsduringthedesignofhighthroughputexperiments AT carsonkeeter propensityscoresasanovelmethodtoguidesampleallocationandminimizebatcheffectsduringthedesignofhighthroughputexperiments AT frandong propensityscoresasanovelmethodtoguidesampleallocationandminimizebatcheffectsduringthedesignofhighthroughputexperiments AT teresabuckner propensityscoresasanovelmethodtoguidesampleallocationandminimizebatcheffectsduringthedesignofhighthroughputexperiments AT elizabethlitkowski propensityscoresasanovelmethodtoguidesampleallocationandminimizebatcheffectsduringthedesignofhighthroughputexperiments AT ivanayang propensityscoresasanovelmethodtoguidesampleallocationandminimizebatcheffectsduringthedesignofhighthroughputexperiments AT jillmnorris propensityscoresasanovelmethodtoguidesampleallocationandminimizebatcheffectsduringthedesignofhighthroughputexperiments AT katerinakechris propensityscoresasanovelmethodtoguidesampleallocationandminimizebatcheffectsduringthedesignofhighthroughputexperiments

Propensity scores as a novel method to guide sample allocation and minimize batch effects during the design of high throughput experiments

Similar Items