Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics

Genome-wide Association Studies (GWAS) result in millions of summary statistics (``z-scores'') for single nucleotide polymorphism (SNP) associations with phenotypes. These rich datasets afford deep insights into the nature and extent of genetic contributions to complex phenotypes such as p...

Full description

Bibliographic Details
Main Authors: Dominic eHolland, Yunpeng eWang, Wesley K Thompson, Andrew eSchork, Chi-Hua eChen, Min-Tzu eLo, Aree eWitoelar, Thomas eWerge, Michael eO'Donovan, Ole A Andreassen, Anders M. Dale
Format: Article
Language:English
Published: Frontiers Media S.A. 2016-02-01
Series:Frontiers in Genetics
Subjects:
Online Access:http://journal.frontiersin.org/Journal/10.3389/fgene.2016.00015/full
_version_ 1819180666263175168
author Dominic eHolland
Dominic eHolland
Yunpeng eWang
Yunpeng eWang
Yunpeng eWang
Yunpeng eWang
Wesley K Thompson
Andrew eSchork
Chi-Hua eChen
Chi-Hua eChen
Min-Tzu eLo
Min-Tzu eLo
Aree eWitoelar
Aree eWitoelar
Thomas eWerge
Michael eO'Donovan
Ole A Andreassen
Ole A Andreassen
Anders M. Dale
Anders M. Dale
Anders M. Dale
Anders M. Dale
author_facet Dominic eHolland
Dominic eHolland
Yunpeng eWang
Yunpeng eWang
Yunpeng eWang
Yunpeng eWang
Wesley K Thompson
Andrew eSchork
Chi-Hua eChen
Chi-Hua eChen
Min-Tzu eLo
Min-Tzu eLo
Aree eWitoelar
Aree eWitoelar
Thomas eWerge
Michael eO'Donovan
Ole A Andreassen
Ole A Andreassen
Anders M. Dale
Anders M. Dale
Anders M. Dale
Anders M. Dale
author_sort Dominic eHolland
collection DOAJ
description Genome-wide Association Studies (GWAS) result in millions of summary statistics (``z-scores'') for single nucleotide polymorphism (SNP) associations with phenotypes. These rich datasets afford deep insights into the nature and extent of genetic contributions to complex phenotypes such as psychiatric disorders, which are understood to have substantial genetic components that arise from very large numbers of SNPs. The complexity of the datasets, however, poses a significant challenge to maximizing their utility. This is reflected in a need for better understanding the landscape of z scores, as such knowledge would enhance causal SNP and gene discovery, help elucidate mechanistic pathways, and inform future study design. Here we present a parsimonious methodology for modeling effect sizes and replication probabilities, relying only on summary statistics from GWAS substudies, and a scheme allowing for direct empirical validation. We show that modeling z-scores as a mixture of Gaussians is conceptually appropriate, in particular taking into account ubiquitous non-null effects that are likely in the datasets due to weak linkage disequilibrium with causal SNPs. The four-parameter model allows for estimating the degree of polygenicity of the phenotype and predicting the proportion of chip heritability explainable by genome-wide significant SNPs in future studies with larger sample sizes. We apply the model to recent GWAS of schizophrenia (N=82,315) and putamen volume (N=12,596), with approximately 9.3 million SNP z-scores in both cases. We show that, over a broad range of z-scores and sample sizes, the model accurately predicts expectation estimates of true effect sizes and replication probabilities in multistage GWAS designs. We assess the degree to which effect sizes are over-estimated when based on linear-regression association coefficients. We estimate the polygenicity of schizophrenia to be 0.037 and the putamen to be 0.001, while the respective sample sizes required to approach fully explaining the chip heritability are $10^6$ and $10^5$. The model can be extended to incorporate prior knowledge such as pleiotropy and SNP annotation. The current findings suggest that the model is applicable to a broad array of complex phenotypes and will enhance understanding of their genetic architectures.
first_indexed 2024-12-22T22:17:58Z
format Article
id doaj.art-69b87a75249444e8b9ed934b00e2c29e
institution Directory Open Access Journal
issn 1664-8021
language English
last_indexed 2024-12-22T22:17:58Z
publishDate 2016-02-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Genetics
spelling doaj.art-69b87a75249444e8b9ed934b00e2c29e2022-12-21T18:10:44ZengFrontiers Media S.A.Frontiers in Genetics1664-80212016-02-01710.3389/fgene.2016.00015182296Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary StatisticsDominic eHolland0Dominic eHolland1Yunpeng eWang2Yunpeng eWang3Yunpeng eWang4Yunpeng eWang5Wesley K Thompson6Andrew eSchork7Chi-Hua eChen8Chi-Hua eChen9Min-Tzu eLo10Min-Tzu eLo11Aree eWitoelar12Aree eWitoelar13Thomas eWerge14Michael eO'Donovan15Ole A Andreassen16Ole A Andreassen17Anders M. Dale18Anders M. Dale19Anders M. Dale20Anders M. Dale21UCSDUniversity of California at San DiegoUCSDUniversity of OsloOslo University HospitalUniversity of California at San DiegoUniversity of California San DiegoUniversity of California San DiegoUniversity of California at San DiegoUniversity of California San DiegoUniversity of California at San DiegoUniversity of California San DiegoUniversity of OsloOslo University HospitalSct. Hans Hospital and University of CopenhagenSchool of Medicine, Cardiff UniversityUniversity of OsloOslo University HospitalUCSDUniversity of California at San DiegoUniversity of California San DiegoUniversity of California San DiegoGenome-wide Association Studies (GWAS) result in millions of summary statistics (``z-scores'') for single nucleotide polymorphism (SNP) associations with phenotypes. These rich datasets afford deep insights into the nature and extent of genetic contributions to complex phenotypes such as psychiatric disorders, which are understood to have substantial genetic components that arise from very large numbers of SNPs. The complexity of the datasets, however, poses a significant challenge to maximizing their utility. This is reflected in a need for better understanding the landscape of z scores, as such knowledge would enhance causal SNP and gene discovery, help elucidate mechanistic pathways, and inform future study design. Here we present a parsimonious methodology for modeling effect sizes and replication probabilities, relying only on summary statistics from GWAS substudies, and a scheme allowing for direct empirical validation. We show that modeling z-scores as a mixture of Gaussians is conceptually appropriate, in particular taking into account ubiquitous non-null effects that are likely in the datasets due to weak linkage disequilibrium with causal SNPs. The four-parameter model allows for estimating the degree of polygenicity of the phenotype and predicting the proportion of chip heritability explainable by genome-wide significant SNPs in future studies with larger sample sizes. We apply the model to recent GWAS of schizophrenia (N=82,315) and putamen volume (N=12,596), with approximately 9.3 million SNP z-scores in both cases. We show that, over a broad range of z-scores and sample sizes, the model accurately predicts expectation estimates of true effect sizes and replication probabilities in multistage GWAS designs. We assess the degree to which effect sizes are over-estimated when based on linear-regression association coefficients. We estimate the polygenicity of schizophrenia to be 0.037 and the putamen to be 0.001, while the respective sample sizes required to approach fully explaining the chip heritability are $10^6$ and $10^5$. The model can be extended to incorporate prior knowledge such as pleiotropy and SNP annotation. The current findings suggest that the model is applicable to a broad array of complex phenotypes and will enhance understanding of their genetic architectures.http://journal.frontiersin.org/Journal/10.3389/fgene.2016.00015/fullPutamenSchizophreniaGWASeffect sizeSNP discoveryGaussian mixture model
spellingShingle Dominic eHolland
Dominic eHolland
Yunpeng eWang
Yunpeng eWang
Yunpeng eWang
Yunpeng eWang
Wesley K Thompson
Andrew eSchork
Chi-Hua eChen
Chi-Hua eChen
Min-Tzu eLo
Min-Tzu eLo
Aree eWitoelar
Aree eWitoelar
Thomas eWerge
Michael eO'Donovan
Ole A Andreassen
Ole A Andreassen
Anders M. Dale
Anders M. Dale
Anders M. Dale
Anders M. Dale
Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics
Frontiers in Genetics
Putamen
Schizophrenia
GWAS
effect size
SNP discovery
Gaussian mixture model
title Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics
title_full Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics
title_fullStr Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics
title_full_unstemmed Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics
title_short Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics
title_sort estimating effect sizes and expected replication probabilities from gwas summary statistics
topic Putamen
Schizophrenia
GWAS
effect size
SNP discovery
Gaussian mixture model
url http://journal.frontiersin.org/Journal/10.3389/fgene.2016.00015/full
work_keys_str_mv AT dominiceholland estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics
AT dominiceholland estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics
AT yunpengewang estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics
AT yunpengewang estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics
AT yunpengewang estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics
AT yunpengewang estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics
AT wesleykthompson estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics
AT andreweschork estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics
AT chihuaechen estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics
AT chihuaechen estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics
AT mintzuelo estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics
AT mintzuelo estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics
AT areeewitoelar estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics
AT areeewitoelar estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics
AT thomasewerge estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics
AT michaeleodonovan estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics
AT oleaandreassen estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics
AT oleaandreassen estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics
AT andersmdale estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics
AT andersmdale estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics
AT andersmdale estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics
AT andersmdale estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics