Removing batch effects for prediction problems with frozen surrogate variable analysis

Batch effects are responsible for the failure of promising genomic prognostic signatures, major ambiguities in published genomic results, and retractions of widely-publicized findings. Batch effect corrections have been developed to remove these artifacts, but they are designed to be used in populat...

Full description

Bibliographic Details
Main Authors: Hilary S. Parker, Héctor Corrada Bravo, Jeffrey T. Leek
Format: Article
Language:English
Published: PeerJ Inc. 2014-09-01
Series:PeerJ
Subjects:
Online Access:https://peerj.com/articles/561.pdf
_version_ 1797425554725535744
author Hilary S. Parker
Héctor Corrada Bravo
Jeffrey T. Leek
author_facet Hilary S. Parker
Héctor Corrada Bravo
Jeffrey T. Leek
author_sort Hilary S. Parker
collection DOAJ
description Batch effects are responsible for the failure of promising genomic prognostic signatures, major ambiguities in published genomic results, and retractions of widely-publicized findings. Batch effect corrections have been developed to remove these artifacts, but they are designed to be used in population studies. But genomic technologies are beginning to be used in clinical applications where samples are analyzed one at a time for diagnostic, prognostic, and predictive applications. There are currently no batch correction methods that have been developed specifically for prediction. In this paper, we propose an new method called frozen surrogate variable analysis (fSVA) that borrows strength from a training set for individual sample batch correction. We show that fSVA improves prediction accuracy in simulations and in public genomic studies. fSVA is available as part of the sva Bioconductor package.
first_indexed 2024-03-09T08:17:47Z
format Article
id doaj.art-9349d45564594fd89f1b6c4ec07c2cf2
institution Directory Open Access Journal
issn 2167-8359
language English
last_indexed 2024-03-09T08:17:47Z
publishDate 2014-09-01
publisher PeerJ Inc.
record_format Article
series PeerJ
spelling doaj.art-9349d45564594fd89f1b6c4ec07c2cf22023-12-02T22:00:45ZengPeerJ Inc.PeerJ2167-83592014-09-012e56110.7717/peerj.561561Removing batch effects for prediction problems with frozen surrogate variable analysisHilary S. Parker0Héctor Corrada Bravo1Jeffrey T. Leek2Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USACenter for Bioinformatics and Computational Biology, Department of Computer Science, University of Maryland, College Park, MD, USADepartment of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USABatch effects are responsible for the failure of promising genomic prognostic signatures, major ambiguities in published genomic results, and retractions of widely-publicized findings. Batch effect corrections have been developed to remove these artifacts, but they are designed to be used in population studies. But genomic technologies are beginning to be used in clinical applications where samples are analyzed one at a time for diagnostic, prognostic, and predictive applications. There are currently no batch correction methods that have been developed specifically for prediction. In this paper, we propose an new method called frozen surrogate variable analysis (fSVA) that borrows strength from a training set for individual sample batch correction. We show that fSVA improves prediction accuracy in simulations and in public genomic studies. fSVA is available as part of the sva Bioconductor package.https://peerj.com/articles/561.pdfBatch effectsSurrogate variable analysisPredictionMachine learningDatabaseStatistics
spellingShingle Hilary S. Parker
Héctor Corrada Bravo
Jeffrey T. Leek
Removing batch effects for prediction problems with frozen surrogate variable analysis
PeerJ
Batch effects
Surrogate variable analysis
Prediction
Machine learning
Database
Statistics
title Removing batch effects for prediction problems with frozen surrogate variable analysis
title_full Removing batch effects for prediction problems with frozen surrogate variable analysis
title_fullStr Removing batch effects for prediction problems with frozen surrogate variable analysis
title_full_unstemmed Removing batch effects for prediction problems with frozen surrogate variable analysis
title_short Removing batch effects for prediction problems with frozen surrogate variable analysis
title_sort removing batch effects for prediction problems with frozen surrogate variable analysis
topic Batch effects
Surrogate variable analysis
Prediction
Machine learning
Database
Statistics
url https://peerj.com/articles/561.pdf
work_keys_str_mv AT hilarysparker removingbatcheffectsforpredictionproblemswithfrozensurrogatevariableanalysis
AT hectorcorradabravo removingbatcheffectsforpredictionproblemswithfrozensurrogatevariableanalysis
AT jeffreytleek removingbatcheffectsforpredictionproblemswithfrozensurrogatevariableanalysis