Bayesian copy number detection and association in large-scale studies

Abstract Background Germline copy number variants (CNVs) increase risk for many diseases, yet detection of CNVs and quantifying their contribution to disease risk in large-scale studies is challenging due to biological and technical sources of heterogeneity that vary across the genome within and bet...

Full description

Bibliographic Details
Main Authors: Stephen Cristiano, David McKean, Jacob Carey, Paige Bracci, Paul Brennan, Michael Chou, Mengmeng Du, Steven Gallinger, Michael G. Goggins, Manal M. Hassan, Rayjean J. Hung, Robert C. Kurtz, Donghui Li, Lingeng Lu, Rachel Neale, Sara Olson, Gloria Petersen, Kari G. Rabe, Jack Fu, Harvey Risch, Gary L. Rosner, Ingo Ruczinski, Alison P. Klein, Robert B. Scharpf
Format: Article
Language:English
Published: BMC 2020-09-01
Series:BMC Cancer
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12885-020-07304-3
_version_ 1828785351227867136
author Stephen Cristiano
David McKean
Jacob Carey
Paige Bracci
Paul Brennan
Michael Chou
Mengmeng Du
Steven Gallinger
Michael G. Goggins
Manal M. Hassan
Rayjean J. Hung
Robert C. Kurtz
Donghui Li
Lingeng Lu
Rachel Neale
Sara Olson
Gloria Petersen
Kari G. Rabe
Jack Fu
Harvey Risch
Gary L. Rosner
Ingo Ruczinski
Alison P. Klein
Robert B. Scharpf
author_facet Stephen Cristiano
David McKean
Jacob Carey
Paige Bracci
Paul Brennan
Michael Chou
Mengmeng Du
Steven Gallinger
Michael G. Goggins
Manal M. Hassan
Rayjean J. Hung
Robert C. Kurtz
Donghui Li
Lingeng Lu
Rachel Neale
Sara Olson
Gloria Petersen
Kari G. Rabe
Jack Fu
Harvey Risch
Gary L. Rosner
Ingo Ruczinski
Alison P. Klein
Robert B. Scharpf
author_sort Stephen Cristiano
collection DOAJ
description Abstract Background Germline copy number variants (CNVs) increase risk for many diseases, yet detection of CNVs and quantifying their contribution to disease risk in large-scale studies is challenging due to biological and technical sources of heterogeneity that vary across the genome within and between samples. Methods We developed an approach called CNPBayes to identify latent batch effects in genome-wide association studies involving copy number, to provide probabilistic estimates of integer copy number across the estimated batches, and to fully integrate the copy number uncertainty in the association model for disease. Results Applying a hidden Markov model (HMM) to identify CNVs in a large multi-site Pancreatic Cancer Case Control study (PanC4) of 7598 participants, we found CNV inference was highly sensitive to technical noise that varied appreciably among participants. Applying CNPBayes to this dataset, we found that the major sources of technical variation were linked to sample processing by the centralized laboratory and not the individual study sites. Modeling the latent batch effects at each CNV region hierarchically, we developed probabilistic estimates of copy number that were directly incorporated in a Bayesian regression model for pancreatic cancer risk. Candidate associations aided by this approach include deletions of 8q24 near regulatory elements of the tumor oncogene MYC and of Tumor Suppressor Candidate 3 (TUSC3). Conclusions Laboratory effects may not account for the major sources of technical variation in genome-wide association studies. This study provides a robust Bayesian inferential framework for identifying latent batch effects, estimating copy number, and evaluating the role of copy number in heritable diseases.
first_indexed 2024-12-11T23:49:36Z
format Article
id doaj.art-ca17e3298f774a5f858a0abae0f47a86
institution Directory Open Access Journal
issn 1471-2407
language English
last_indexed 2024-12-11T23:49:36Z
publishDate 2020-09-01
publisher BMC
record_format Article
series BMC Cancer
spelling doaj.art-ca17e3298f774a5f858a0abae0f47a862022-12-22T00:45:31ZengBMCBMC Cancer1471-24072020-09-0120111410.1186/s12885-020-07304-3Bayesian copy number detection and association in large-scale studiesStephen Cristiano0David McKean1Jacob Carey2Paige Bracci3Paul Brennan4Michael Chou5Mengmeng Du6Steven Gallinger7Michael G. Goggins8Manal M. Hassan9Rayjean J. Hung10Robert C. Kurtz11Donghui Li12Lingeng Lu13Rachel Neale14Sara Olson15Gloria Petersen16Kari G. Rabe17Jack Fu18Harvey Risch19Gary L. Rosner20Ingo Ruczinski21Alison P. Klein22Robert B. Scharpf23Department of Biostatistics, Johns Hopkins Bloomberg School of Public HealthDepartment of Oncology The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of MedicineDepartment of Biostatistics, Johns Hopkins Bloomberg School of Public HealthDepartment of Epidemiology and Biostatistics, University of California, San FranciscoGenetics Section, International Agency for Research on CancerDepartment of Epidemiology, Johns Hopkins Bloomberg School of Public HealthDepartment of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer CenterLunenfeld-Tanenbaum Research Institute of Mount Sinai HospitalDepartment of Medicine, Johns Hopkins University School of MedicineDepartment of Epidemiology, Cancer Prevention & Population Sciences, UT MD Anderson Cancer CenterLunenfeld-Tanenbaum Research Institute of Mount Sinai HospitalDepartment of Gastroenterology, Hepatology, and Nutrition Service, Memorial Sloan Kettering Cancer CenterDepartment of Gastrointestinal Medical Oncology, University of Texas MD Anderson Cancer CenterDepartment of Chronic Disease Epidemiology, Yale School of Public Health, Yale Cancer CenterPopulation Health Department, QIMR Berghofer Medical Research InstituteDepartment of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer CenterDepartment of Health Sciences Research, Mayo Clinic College of MedicineDepartment of Health Sciences Research, Mayo Clinic College of MedicineDepartment of Biostatistics, Johns Hopkins Bloomberg School of Public HealthDepartment of Chronic Disease Epidemiology, Yale School of Public Health, Yale Cancer CenterDepartment of Biostatistics, Johns Hopkins Bloomberg School of Public HealthDepartment of Biostatistics, Johns Hopkins Bloomberg School of Public HealthDepartment of Oncology The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of MedicineDepartment of Biostatistics, Johns Hopkins Bloomberg School of Public HealthAbstract Background Germline copy number variants (CNVs) increase risk for many diseases, yet detection of CNVs and quantifying their contribution to disease risk in large-scale studies is challenging due to biological and technical sources of heterogeneity that vary across the genome within and between samples. Methods We developed an approach called CNPBayes to identify latent batch effects in genome-wide association studies involving copy number, to provide probabilistic estimates of integer copy number across the estimated batches, and to fully integrate the copy number uncertainty in the association model for disease. Results Applying a hidden Markov model (HMM) to identify CNVs in a large multi-site Pancreatic Cancer Case Control study (PanC4) of 7598 participants, we found CNV inference was highly sensitive to technical noise that varied appreciably among participants. Applying CNPBayes to this dataset, we found that the major sources of technical variation were linked to sample processing by the centralized laboratory and not the individual study sites. Modeling the latent batch effects at each CNV region hierarchically, we developed probabilistic estimates of copy number that were directly incorporated in a Bayesian regression model for pancreatic cancer risk. Candidate associations aided by this approach include deletions of 8q24 near regulatory elements of the tumor oncogene MYC and of Tumor Suppressor Candidate 3 (TUSC3). Conclusions Laboratory effects may not account for the major sources of technical variation in genome-wide association studies. This study provides a robust Bayesian inferential framework for identifying latent batch effects, estimating copy number, and evaluating the role of copy number in heritable diseases.http://link.springer.com/article/10.1186/s12885-020-07304-3Pancreatic cancerSNP arrayCopy number variantsGenome-wide associationCNPBayesBatch effects
spellingShingle Stephen Cristiano
David McKean
Jacob Carey
Paige Bracci
Paul Brennan
Michael Chou
Mengmeng Du
Steven Gallinger
Michael G. Goggins
Manal M. Hassan
Rayjean J. Hung
Robert C. Kurtz
Donghui Li
Lingeng Lu
Rachel Neale
Sara Olson
Gloria Petersen
Kari G. Rabe
Jack Fu
Harvey Risch
Gary L. Rosner
Ingo Ruczinski
Alison P. Klein
Robert B. Scharpf
Bayesian copy number detection and association in large-scale studies
BMC Cancer
Pancreatic cancer
SNP array
Copy number variants
Genome-wide association
CNPBayes
Batch effects
title Bayesian copy number detection and association in large-scale studies
title_full Bayesian copy number detection and association in large-scale studies
title_fullStr Bayesian copy number detection and association in large-scale studies
title_full_unstemmed Bayesian copy number detection and association in large-scale studies
title_short Bayesian copy number detection and association in large-scale studies
title_sort bayesian copy number detection and association in large scale studies
topic Pancreatic cancer
SNP array
Copy number variants
Genome-wide association
CNPBayes
Batch effects
url http://link.springer.com/article/10.1186/s12885-020-07304-3
work_keys_str_mv AT stephencristiano bayesiancopynumberdetectionandassociationinlargescalestudies
AT davidmckean bayesiancopynumberdetectionandassociationinlargescalestudies
AT jacobcarey bayesiancopynumberdetectionandassociationinlargescalestudies
AT paigebracci bayesiancopynumberdetectionandassociationinlargescalestudies
AT paulbrennan bayesiancopynumberdetectionandassociationinlargescalestudies
AT michaelchou bayesiancopynumberdetectionandassociationinlargescalestudies
AT mengmengdu bayesiancopynumberdetectionandassociationinlargescalestudies
AT stevengallinger bayesiancopynumberdetectionandassociationinlargescalestudies
AT michaelggoggins bayesiancopynumberdetectionandassociationinlargescalestudies
AT manalmhassan bayesiancopynumberdetectionandassociationinlargescalestudies
AT rayjeanjhung bayesiancopynumberdetectionandassociationinlargescalestudies
AT robertckurtz bayesiancopynumberdetectionandassociationinlargescalestudies
AT donghuili bayesiancopynumberdetectionandassociationinlargescalestudies
AT lingenglu bayesiancopynumberdetectionandassociationinlargescalestudies
AT rachelneale bayesiancopynumberdetectionandassociationinlargescalestudies
AT saraolson bayesiancopynumberdetectionandassociationinlargescalestudies
AT gloriapetersen bayesiancopynumberdetectionandassociationinlargescalestudies
AT karigrabe bayesiancopynumberdetectionandassociationinlargescalestudies
AT jackfu bayesiancopynumberdetectionandassociationinlargescalestudies
AT harveyrisch bayesiancopynumberdetectionandassociationinlargescalestudies
AT garylrosner bayesiancopynumberdetectionandassociationinlargescalestudies
AT ingoruczinski bayesiancopynumberdetectionandassociationinlargescalestudies
AT alisonpklein bayesiancopynumberdetectionandassociationinlargescalestudies
AT robertbscharpf bayesiancopynumberdetectionandassociationinlargescalestudies