Resetting the bar: Statistical significance in whole-genome sequencing-based association studies of global populations

Genome-wide association studies (GWAS) of common disease have been hugely successful in implicating loci that modify disease risk. The bulk of these associations have proven robust and reproducible, in part due to community adoption of statistical criteria for claiming significant genotype-phenotype...

Full description

Bibliographic Details
Main Authors: Pulit, S, , d, de Bakker, P
Format: Journal article
Language:English
Published: Wiley 2016
_version_ 1797059436992266240
author Pulit, S
, d
de Bakker, P
author_facet Pulit, S
, d
de Bakker, P
author_sort Pulit, S
collection OXFORD
description Genome-wide association studies (GWAS) of common disease have been hugely successful in implicating loci that modify disease risk. The bulk of these associations have proven robust and reproducible, in part due to community adoption of statistical criteria for claiming significant genotype-phenotype associations. As the cost of sequencing continues to drop, assembling large samples in global populations is becoming increasingly feasible. Sequencing studies interrogate not only common variants, as was true for genotyping-based GWAS, but variation across the full allele frequency spectrum, yielding many more (independent) statistical tests. We sought to empirically determine genome-wide significance thresholds for various analysis scenarios. Using whole-genome sequence data, we simulated sequencing-based disease studies of varying sample size and ancestry. We determined that future sequencing efforts in >2,000 samples of European, Asian, or admixed ancestry should set genome-wide significance at approximately P = 5 × 10(-9) , and studies of African samples should apply a more stringent genome-wide significance threshold of P = 1 × 10(-9) . Adoption of a revised multiple test correction will be crucial in avoiding irreproducible claims of association.
first_indexed 2024-03-06T20:04:14Z
format Journal article
id oxford-uuid:286135b0-847e-4732-a5d2-452d454f0264
institution University of Oxford
language English
last_indexed 2024-03-06T20:04:14Z
publishDate 2016
publisher Wiley
record_format dspace
spelling oxford-uuid:286135b0-847e-4732-a5d2-452d454f02642022-03-26T12:12:33ZResetting the bar: Statistical significance in whole-genome sequencing-based association studies of global populationsJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:286135b0-847e-4732-a5d2-452d454f0264EnglishSymplectic Elements at OxfordWiley2016Pulit, S, dde Bakker, PGenome-wide association studies (GWAS) of common disease have been hugely successful in implicating loci that modify disease risk. The bulk of these associations have proven robust and reproducible, in part due to community adoption of statistical criteria for claiming significant genotype-phenotype associations. As the cost of sequencing continues to drop, assembling large samples in global populations is becoming increasingly feasible. Sequencing studies interrogate not only common variants, as was true for genotyping-based GWAS, but variation across the full allele frequency spectrum, yielding many more (independent) statistical tests. We sought to empirically determine genome-wide significance thresholds for various analysis scenarios. Using whole-genome sequence data, we simulated sequencing-based disease studies of varying sample size and ancestry. We determined that future sequencing efforts in >2,000 samples of European, Asian, or admixed ancestry should set genome-wide significance at approximately P = 5 × 10(-9) , and studies of African samples should apply a more stringent genome-wide significance threshold of P = 1 × 10(-9) . Adoption of a revised multiple test correction will be crucial in avoiding irreproducible claims of association.
spellingShingle Pulit, S
, d
de Bakker, P
Resetting the bar: Statistical significance in whole-genome sequencing-based association studies of global populations
title Resetting the bar: Statistical significance in whole-genome sequencing-based association studies of global populations
title_full Resetting the bar: Statistical significance in whole-genome sequencing-based association studies of global populations
title_fullStr Resetting the bar: Statistical significance in whole-genome sequencing-based association studies of global populations
title_full_unstemmed Resetting the bar: Statistical significance in whole-genome sequencing-based association studies of global populations
title_short Resetting the bar: Statistical significance in whole-genome sequencing-based association studies of global populations
title_sort resetting the bar statistical significance in whole genome sequencing based association studies of global populations
work_keys_str_mv AT pulits resettingthebarstatisticalsignificanceinwholegenomesequencingbasedassociationstudiesofglobalpopulations
AT d resettingthebarstatisticalsignificanceinwholegenomesequencingbasedassociationstudiesofglobalpopulations
AT debakkerp resettingthebarstatisticalsignificanceinwholegenomesequencingbasedassociationstudiesofglobalpopulations