Efficient Bayesian mixed-model analysis increases association power in large cohorts

Linear mixed models are a powerful statistical tool for identifying genetic associations and avoiding confounding. However, existing methods are computationally intractable in large cohorts and may not optimize power. All existing methods require time cost O(MN2) (where N is the number of samples an...

Full description

Bibliographic Details
Main Authors: Loh, Po-Ru, Bulik-Sullivan, Brendan K, Vilhjálmsson, Bjarni J, Salem, Rany M, Chasman, Daniel I, Ridker, Paul M, Neale, Benjamin M, Patterson, Nick, Price, Alkes L, Tucker, George Jay, Finucane, Hilary Kiyo, Berger Leighton, Bonnie
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format: Article
Language:en_US
Published: Nature Publishing Group 2017
Online Access:http://hdl.handle.net/1721.1/110185
https://orcid.org/0000-0003-3864-9828
https://orcid.org/0000-0002-2724-7228
_version_ 1826208367857631232
author Loh, Po-Ru
Bulik-Sullivan, Brendan K
Vilhjálmsson, Bjarni J
Salem, Rany M
Chasman, Daniel I
Ridker, Paul M
Neale, Benjamin M
Patterson, Nick
Price, Alkes L
Tucker, George Jay
Finucane, Hilary Kiyo
Berger Leighton, Bonnie
author2 Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Loh, Po-Ru
Bulik-Sullivan, Brendan K
Vilhjálmsson, Bjarni J
Salem, Rany M
Chasman, Daniel I
Ridker, Paul M
Neale, Benjamin M
Patterson, Nick
Price, Alkes L
Tucker, George Jay
Finucane, Hilary Kiyo
Berger Leighton, Bonnie
author_sort Loh, Po-Ru
collection MIT
description Linear mixed models are a powerful statistical tool for identifying genetic associations and avoiding confounding. However, existing methods are computationally intractable in large cohorts and may not optimize power. All existing methods require time cost O(MN2) (where N is the number of samples and M is the number of SNPs) and implicitly assume an infinitesimal genetic architecture in which effect sizes are normally distributed, which can limit power. Here we present a far more efficient mixed-model association method, BOLT-LMM, which requires only a small number of O(MN) time iterations and increases power by modeling more realistic, non-infinitesimal genetic architectures via a Bayesian mixture prior on marker effect sizes. We applied BOLT-LMM to 9 quantitative traits in 23,294 samples from the Women's Genome Health Study (WGHS) and observed significant increases in power, consistent with simulations. Theory and simulations show that the boost in power increases with cohort size, making BOLT-LMM appealing for genome-wide association studies in large cohorts.
first_indexed 2024-09-23T14:04:39Z
format Article
id mit-1721.1/110185
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T14:04:39Z
publishDate 2017
publisher Nature Publishing Group
record_format dspace
spelling mit-1721.1/1101852022-09-28T18:13:30Z Efficient Bayesian mixed-model analysis increases association power in large cohorts Loh, Po-Ru Bulik-Sullivan, Brendan K Vilhjálmsson, Bjarni J Salem, Rany M Chasman, Daniel I Ridker, Paul M Neale, Benjamin M Patterson, Nick Price, Alkes L Tucker, George Jay Finucane, Hilary Kiyo Berger Leighton, Bonnie Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Department of Mathematics Tucker, George Jay Finucane, Hilary Kiyo Berger Leighton, Bonnie Linear mixed models are a powerful statistical tool for identifying genetic associations and avoiding confounding. However, existing methods are computationally intractable in large cohorts and may not optimize power. All existing methods require time cost O(MN2) (where N is the number of samples and M is the number of SNPs) and implicitly assume an infinitesimal genetic architecture in which effect sizes are normally distributed, which can limit power. Here we present a far more efficient mixed-model association method, BOLT-LMM, which requires only a small number of O(MN) time iterations and increases power by modeling more realistic, non-infinitesimal genetic architectures via a Bayesian mixture prior on marker effect sizes. We applied BOLT-LMM to 9 quantitative traits in 23,294 samples from the Women's Genome Health Study (WGHS) and observed significant increases in power, consistent with simulations. Theory and simulations show that the boost in power increases with cohort size, making BOLT-LMM appealing for genome-wide association studies in large cohorts. National Institutes of Health (U.S.) (grant R01 HG006399) National Institutes of Health (U.S.) (fellowship F32 HG007805) Hertz Foundation 2017-06-22T21:32:59Z 2017-06-22T21:32:59Z 2015-03 2014-07 Article http://purl.org/eprint/type/JournalArticle 1061-4036 1546-1718 http://hdl.handle.net/1721.1/110185 Loh, Po-Ru, George Tucker, Brendan K Bulik-Sullivan, Bjarni J Vilhjálmsson, Hilary K Finucane, Rany M Salem, Daniel I Chasman, et al. “Efficient Bayesian Mixed-Model Analysis Increases Association Power in Large Cohorts.” Nat Genet 47, no. 3 (February 2, 2015): 284–290. © 2015 Nature America, Inc. https://orcid.org/0000-0003-3864-9828 https://orcid.org/0000-0002-2724-7228 en_US http://dx.doi.org/10.1038/ng.3190 Nature Genetics Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Nature Publishing Group PMC
spellingShingle Loh, Po-Ru
Bulik-Sullivan, Brendan K
Vilhjálmsson, Bjarni J
Salem, Rany M
Chasman, Daniel I
Ridker, Paul M
Neale, Benjamin M
Patterson, Nick
Price, Alkes L
Tucker, George Jay
Finucane, Hilary Kiyo
Berger Leighton, Bonnie
Efficient Bayesian mixed-model analysis increases association power in large cohorts
title Efficient Bayesian mixed-model analysis increases association power in large cohorts
title_full Efficient Bayesian mixed-model analysis increases association power in large cohorts
title_fullStr Efficient Bayesian mixed-model analysis increases association power in large cohorts
title_full_unstemmed Efficient Bayesian mixed-model analysis increases association power in large cohorts
title_short Efficient Bayesian mixed-model analysis increases association power in large cohorts
title_sort efficient bayesian mixed model analysis increases association power in large cohorts
url http://hdl.handle.net/1721.1/110185
https://orcid.org/0000-0003-3864-9828
https://orcid.org/0000-0002-2724-7228
work_keys_str_mv AT lohporu efficientbayesianmixedmodelanalysisincreasesassociationpowerinlargecohorts
AT buliksullivanbrendank efficientbayesianmixedmodelanalysisincreasesassociationpowerinlargecohorts
AT vilhjalmssonbjarnij efficientbayesianmixedmodelanalysisincreasesassociationpowerinlargecohorts
AT salemranym efficientbayesianmixedmodelanalysisincreasesassociationpowerinlargecohorts
AT chasmandanieli efficientbayesianmixedmodelanalysisincreasesassociationpowerinlargecohorts
AT ridkerpaulm efficientbayesianmixedmodelanalysisincreasesassociationpowerinlargecohorts
AT nealebenjaminm efficientbayesianmixedmodelanalysisincreasesassociationpowerinlargecohorts
AT pattersonnick efficientbayesianmixedmodelanalysisincreasesassociationpowerinlargecohorts
AT pricealkesl efficientbayesianmixedmodelanalysisincreasesassociationpowerinlargecohorts
AT tuckergeorgejay efficientbayesianmixedmodelanalysisincreasesassociationpowerinlargecohorts
AT finucanehilarykiyo efficientbayesianmixedmodelanalysisincreasesassociationpowerinlargecohorts
AT bergerleightonbonnie efficientbayesianmixedmodelanalysisincreasesassociationpowerinlargecohorts