Efficient approaches for large-scale GWAS with genotype uncertainty

Association studies using genetic data from SNP-chip-based imputation or low-depth sequencing data provide a cost-efficient design for large-scale association studies. We explore methods for performing association studies applicable to such genetic data and investigate how using different priors whe...

Full description

Bibliographic Details
Main Authors: Jørsboe, E, Albrechtsen, A
Format: Journal article
Language:English
Published: Oxford University Press 2021
_version_ 1826311845615501312
author Jørsboe, E
Albrechtsen, A
author_facet Jørsboe, E
Albrechtsen, A
author_sort Jørsboe, E
collection OXFORD
description Association studies using genetic data from SNP-chip-based imputation or low-depth sequencing data provide a cost-efficient design for large-scale association studies. We explore methods for performing association studies applicable to such genetic data and investigate how using different priors when estimating genotype probabilities affects the association results. Our proposed method, ANGSD-asso’s latent model, models the unobserved genotype as a latent variable in a generalized linear model framework. The software is implemented in C/C++ and can be run multi-threaded. ANGSD-asso is based on genotype probabilities, which can be estimated using either the sample allele frequency or the individual allele frequencies as a prior. We explore through simulations how genotype probability-based methods compare with using genetic dosages. Our simulations show that in a structured population using the individual allele frequency prior has better power than the sample allele frequency. In scenarios with sequencing depth and phenotype correlation ANGSD-asso’s latent model has higher statistical power and less bias than using dosages. Adding additional covariates to the linear model of ANGSD-asso’s latent model has higher statistical power and less bias than other methods that accommodate genotype uncertainty, while also being much faster. This is shown with imputed data from UK Biobank and simulations.
first_indexed 2024-03-07T08:17:18Z
format Journal article
id oxford-uuid:80dbcbe0-78f6-4933-b7fb-b198836ca3e6
institution University of Oxford
language English
last_indexed 2024-03-07T08:17:18Z
publishDate 2021
publisher Oxford University Press
record_format dspace
spelling oxford-uuid:80dbcbe0-78f6-4933-b7fb-b198836ca3e62024-01-12T15:40:11ZEfficient approaches for large-scale GWAS with genotype uncertaintyJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:80dbcbe0-78f6-4933-b7fb-b198836ca3e6EnglishSymplectic ElementsOxford University Press2021Jørsboe, EAlbrechtsen, AAssociation studies using genetic data from SNP-chip-based imputation or low-depth sequencing data provide a cost-efficient design for large-scale association studies. We explore methods for performing association studies applicable to such genetic data and investigate how using different priors when estimating genotype probabilities affects the association results. Our proposed method, ANGSD-asso’s latent model, models the unobserved genotype as a latent variable in a generalized linear model framework. The software is implemented in C/C++ and can be run multi-threaded. ANGSD-asso is based on genotype probabilities, which can be estimated using either the sample allele frequency or the individual allele frequencies as a prior. We explore through simulations how genotype probability-based methods compare with using genetic dosages. Our simulations show that in a structured population using the individual allele frequency prior has better power than the sample allele frequency. In scenarios with sequencing depth and phenotype correlation ANGSD-asso’s latent model has higher statistical power and less bias than using dosages. Adding additional covariates to the linear model of ANGSD-asso’s latent model has higher statistical power and less bias than other methods that accommodate genotype uncertainty, while also being much faster. This is shown with imputed data from UK Biobank and simulations.
spellingShingle Jørsboe, E
Albrechtsen, A
Efficient approaches for large-scale GWAS with genotype uncertainty
title Efficient approaches for large-scale GWAS with genotype uncertainty
title_full Efficient approaches for large-scale GWAS with genotype uncertainty
title_fullStr Efficient approaches for large-scale GWAS with genotype uncertainty
title_full_unstemmed Efficient approaches for large-scale GWAS with genotype uncertainty
title_short Efficient approaches for large-scale GWAS with genotype uncertainty
title_sort efficient approaches for large scale gwas with genotype uncertainty
work_keys_str_mv AT jørsboee efficientapproachesforlargescalegwaswithgenotypeuncertainty
AT albrechtsena efficientapproachesforlargescalegwaswithgenotypeuncertainty