Bayesian copy number detection and association in large-scale studies
Abstract Background Germline copy number variants (CNVs) increase risk for many diseases, yet detection of CNVs and quantifying their contribution to disease risk in large-scale studies is challenging due to biological and technical sources of heterogeneity that vary across the genome within and bet...
Main Authors: | , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2020-09-01
|
Series: | BMC Cancer |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s12885-020-07304-3 |
_version_ | 1828785351227867136 |
---|---|
author | Stephen Cristiano David McKean Jacob Carey Paige Bracci Paul Brennan Michael Chou Mengmeng Du Steven Gallinger Michael G. Goggins Manal M. Hassan Rayjean J. Hung Robert C. Kurtz Donghui Li Lingeng Lu Rachel Neale Sara Olson Gloria Petersen Kari G. Rabe Jack Fu Harvey Risch Gary L. Rosner Ingo Ruczinski Alison P. Klein Robert B. Scharpf |
author_facet | Stephen Cristiano David McKean Jacob Carey Paige Bracci Paul Brennan Michael Chou Mengmeng Du Steven Gallinger Michael G. Goggins Manal M. Hassan Rayjean J. Hung Robert C. Kurtz Donghui Li Lingeng Lu Rachel Neale Sara Olson Gloria Petersen Kari G. Rabe Jack Fu Harvey Risch Gary L. Rosner Ingo Ruczinski Alison P. Klein Robert B. Scharpf |
author_sort | Stephen Cristiano |
collection | DOAJ |
description | Abstract Background Germline copy number variants (CNVs) increase risk for many diseases, yet detection of CNVs and quantifying their contribution to disease risk in large-scale studies is challenging due to biological and technical sources of heterogeneity that vary across the genome within and between samples. Methods We developed an approach called CNPBayes to identify latent batch effects in genome-wide association studies involving copy number, to provide probabilistic estimates of integer copy number across the estimated batches, and to fully integrate the copy number uncertainty in the association model for disease. Results Applying a hidden Markov model (HMM) to identify CNVs in a large multi-site Pancreatic Cancer Case Control study (PanC4) of 7598 participants, we found CNV inference was highly sensitive to technical noise that varied appreciably among participants. Applying CNPBayes to this dataset, we found that the major sources of technical variation were linked to sample processing by the centralized laboratory and not the individual study sites. Modeling the latent batch effects at each CNV region hierarchically, we developed probabilistic estimates of copy number that were directly incorporated in a Bayesian regression model for pancreatic cancer risk. Candidate associations aided by this approach include deletions of 8q24 near regulatory elements of the tumor oncogene MYC and of Tumor Suppressor Candidate 3 (TUSC3). Conclusions Laboratory effects may not account for the major sources of technical variation in genome-wide association studies. This study provides a robust Bayesian inferential framework for identifying latent batch effects, estimating copy number, and evaluating the role of copy number in heritable diseases. |
first_indexed | 2024-12-11T23:49:36Z |
format | Article |
id | doaj.art-ca17e3298f774a5f858a0abae0f47a86 |
institution | Directory Open Access Journal |
issn | 1471-2407 |
language | English |
last_indexed | 2024-12-11T23:49:36Z |
publishDate | 2020-09-01 |
publisher | BMC |
record_format | Article |
series | BMC Cancer |
spelling | doaj.art-ca17e3298f774a5f858a0abae0f47a862022-12-22T00:45:31ZengBMCBMC Cancer1471-24072020-09-0120111410.1186/s12885-020-07304-3Bayesian copy number detection and association in large-scale studiesStephen Cristiano0David McKean1Jacob Carey2Paige Bracci3Paul Brennan4Michael Chou5Mengmeng Du6Steven Gallinger7Michael G. Goggins8Manal M. Hassan9Rayjean J. Hung10Robert C. Kurtz11Donghui Li12Lingeng Lu13Rachel Neale14Sara Olson15Gloria Petersen16Kari G. Rabe17Jack Fu18Harvey Risch19Gary L. Rosner20Ingo Ruczinski21Alison P. Klein22Robert B. Scharpf23Department of Biostatistics, Johns Hopkins Bloomberg School of Public HealthDepartment of Oncology The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of MedicineDepartment of Biostatistics, Johns Hopkins Bloomberg School of Public HealthDepartment of Epidemiology and Biostatistics, University of California, San FranciscoGenetics Section, International Agency for Research on CancerDepartment of Epidemiology, Johns Hopkins Bloomberg School of Public HealthDepartment of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer CenterLunenfeld-Tanenbaum Research Institute of Mount Sinai HospitalDepartment of Medicine, Johns Hopkins University School of MedicineDepartment of Epidemiology, Cancer Prevention & Population Sciences, UT MD Anderson Cancer CenterLunenfeld-Tanenbaum Research Institute of Mount Sinai HospitalDepartment of Gastroenterology, Hepatology, and Nutrition Service, Memorial Sloan Kettering Cancer CenterDepartment of Gastrointestinal Medical Oncology, University of Texas MD Anderson Cancer CenterDepartment of Chronic Disease Epidemiology, Yale School of Public Health, Yale Cancer CenterPopulation Health Department, QIMR Berghofer Medical Research InstituteDepartment of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer CenterDepartment of Health Sciences Research, Mayo Clinic College of MedicineDepartment of Health Sciences Research, Mayo Clinic College of MedicineDepartment of Biostatistics, Johns Hopkins Bloomberg School of Public HealthDepartment of Chronic Disease Epidemiology, Yale School of Public Health, Yale Cancer CenterDepartment of Biostatistics, Johns Hopkins Bloomberg School of Public HealthDepartment of Biostatistics, Johns Hopkins Bloomberg School of Public HealthDepartment of Oncology The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of MedicineDepartment of Biostatistics, Johns Hopkins Bloomberg School of Public HealthAbstract Background Germline copy number variants (CNVs) increase risk for many diseases, yet detection of CNVs and quantifying their contribution to disease risk in large-scale studies is challenging due to biological and technical sources of heterogeneity that vary across the genome within and between samples. Methods We developed an approach called CNPBayes to identify latent batch effects in genome-wide association studies involving copy number, to provide probabilistic estimates of integer copy number across the estimated batches, and to fully integrate the copy number uncertainty in the association model for disease. Results Applying a hidden Markov model (HMM) to identify CNVs in a large multi-site Pancreatic Cancer Case Control study (PanC4) of 7598 participants, we found CNV inference was highly sensitive to technical noise that varied appreciably among participants. Applying CNPBayes to this dataset, we found that the major sources of technical variation were linked to sample processing by the centralized laboratory and not the individual study sites. Modeling the latent batch effects at each CNV region hierarchically, we developed probabilistic estimates of copy number that were directly incorporated in a Bayesian regression model for pancreatic cancer risk. Candidate associations aided by this approach include deletions of 8q24 near regulatory elements of the tumor oncogene MYC and of Tumor Suppressor Candidate 3 (TUSC3). Conclusions Laboratory effects may not account for the major sources of technical variation in genome-wide association studies. This study provides a robust Bayesian inferential framework for identifying latent batch effects, estimating copy number, and evaluating the role of copy number in heritable diseases.http://link.springer.com/article/10.1186/s12885-020-07304-3Pancreatic cancerSNP arrayCopy number variantsGenome-wide associationCNPBayesBatch effects |
spellingShingle | Stephen Cristiano David McKean Jacob Carey Paige Bracci Paul Brennan Michael Chou Mengmeng Du Steven Gallinger Michael G. Goggins Manal M. Hassan Rayjean J. Hung Robert C. Kurtz Donghui Li Lingeng Lu Rachel Neale Sara Olson Gloria Petersen Kari G. Rabe Jack Fu Harvey Risch Gary L. Rosner Ingo Ruczinski Alison P. Klein Robert B. Scharpf Bayesian copy number detection and association in large-scale studies BMC Cancer Pancreatic cancer SNP array Copy number variants Genome-wide association CNPBayes Batch effects |
title | Bayesian copy number detection and association in large-scale studies |
title_full | Bayesian copy number detection and association in large-scale studies |
title_fullStr | Bayesian copy number detection and association in large-scale studies |
title_full_unstemmed | Bayesian copy number detection and association in large-scale studies |
title_short | Bayesian copy number detection and association in large-scale studies |
title_sort | bayesian copy number detection and association in large scale studies |
topic | Pancreatic cancer SNP array Copy number variants Genome-wide association CNPBayes Batch effects |
url | http://link.springer.com/article/10.1186/s12885-020-07304-3 |
work_keys_str_mv | AT stephencristiano bayesiancopynumberdetectionandassociationinlargescalestudies AT davidmckean bayesiancopynumberdetectionandassociationinlargescalestudies AT jacobcarey bayesiancopynumberdetectionandassociationinlargescalestudies AT paigebracci bayesiancopynumberdetectionandassociationinlargescalestudies AT paulbrennan bayesiancopynumberdetectionandassociationinlargescalestudies AT michaelchou bayesiancopynumberdetectionandassociationinlargescalestudies AT mengmengdu bayesiancopynumberdetectionandassociationinlargescalestudies AT stevengallinger bayesiancopynumberdetectionandassociationinlargescalestudies AT michaelggoggins bayesiancopynumberdetectionandassociationinlargescalestudies AT manalmhassan bayesiancopynumberdetectionandassociationinlargescalestudies AT rayjeanjhung bayesiancopynumberdetectionandassociationinlargescalestudies AT robertckurtz bayesiancopynumberdetectionandassociationinlargescalestudies AT donghuili bayesiancopynumberdetectionandassociationinlargescalestudies AT lingenglu bayesiancopynumberdetectionandassociationinlargescalestudies AT rachelneale bayesiancopynumberdetectionandassociationinlargescalestudies AT saraolson bayesiancopynumberdetectionandassociationinlargescalestudies AT gloriapetersen bayesiancopynumberdetectionandassociationinlargescalestudies AT karigrabe bayesiancopynumberdetectionandassociationinlargescalestudies AT jackfu bayesiancopynumberdetectionandassociationinlargescalestudies AT harveyrisch bayesiancopynumberdetectionandassociationinlargescalestudies AT garylrosner bayesiancopynumberdetectionandassociationinlargescalestudies AT ingoruczinski bayesiancopynumberdetectionandassociationinlargescalestudies AT alisonpklein bayesiancopynumberdetectionandassociationinlargescalestudies AT robertbscharpf bayesiancopynumberdetectionandassociationinlargescalestudies |