Biobank-scale ancestral recombination graphs: inference and applications to the analysis of complex traits

<p>Across living species, DNA is transmitted from generation to generation via the processes of inheritance, mutation, and recombination. The history of these processes can be recorded using genome-wide gene genealogies. Accurate inference of gene genealogies from genetic data has the potentia...

Full description

Bibliographic Details
Main Author: Zhang, BC
Other Authors: Palamara, P
Format: Thesis
Language:English
Published: 2022
Subjects:
_version_ 1826310120471003136
author Zhang, BC
author2 Palamara, P
author_facet Palamara, P
Zhang, BC
author_sort Zhang, BC
collection OXFORD
description <p>Across living species, DNA is transmitted from generation to generation via the processes of inheritance, mutation, and recombination. The history of these processes can be recorded using genome-wide gene genealogies. Accurate inference of gene genealogies from genetic data has the potential to facilitate a wide range of analyses, but is computationally challenging. In this thesis, we introduce a scalable method, called ARG-Needle, that uses genotype hashing and a coalescent hidden Markov model to infer genome-wide genealogies from sequencing or genotyping array data in modern biobanks. We develop strategies that utilise the inferred genome-wide genealogies within linear mixed models to perform association and other analyses of biomedical traits.</p> <p>We validate the accuracy and scalability of ARG-Needle through extensive coalescent simulations, and use ARG-Needle to build genome-wide genealogies from genotypes of 337,464 UK Biobank individuals. We perform genealogy-based association analysis of 7 complex traits, detecting more rare and ultra-rare signals (N = 133, frequency range 0.0004% − 0.1%) than genotype imputation from ∼65,000 sequenced haplotypes (N = 65). We validate these signals using exome sequencing data from 138,039 individuals. ARG-Needle associations strongly tag (average r = 0.72) underlying sequencing variants that are enriched for missense (2.3×) and loss-of-function (4.5×) variation. Compared to imputation, inferred genealogies also capture additional signals for higher frequency variants. These results demonstrate that biobank-scale inference of gene genealogies may be leveraged in the analysis of complex traits, complementing approaches that require the availability of large, population-specific sequencing panels.</p>
first_indexed 2024-03-07T07:45:56Z
format Thesis
id oxford-uuid:c4e8fb87-618b-4ca1-ad04-16f415ce5021
institution University of Oxford
language English
last_indexed 2024-03-07T07:45:56Z
publishDate 2022
record_format dspace
spelling oxford-uuid:c4e8fb87-618b-4ca1-ad04-16f415ce50212023-06-05T14:15:27ZBiobank-scale ancestral recombination graphs: inference and applications to the analysis of complex traitsThesishttp://purl.org/coar/resource_type/c_db06uuid:c4e8fb87-618b-4ca1-ad04-16f415ce5021Genetics--Statistical methodsEnglishHyrax Deposit2022Zhang, BCPalamara, P<p>Across living species, DNA is transmitted from generation to generation via the processes of inheritance, mutation, and recombination. The history of these processes can be recorded using genome-wide gene genealogies. Accurate inference of gene genealogies from genetic data has the potential to facilitate a wide range of analyses, but is computationally challenging. In this thesis, we introduce a scalable method, called ARG-Needle, that uses genotype hashing and a coalescent hidden Markov model to infer genome-wide genealogies from sequencing or genotyping array data in modern biobanks. We develop strategies that utilise the inferred genome-wide genealogies within linear mixed models to perform association and other analyses of biomedical traits.</p> <p>We validate the accuracy and scalability of ARG-Needle through extensive coalescent simulations, and use ARG-Needle to build genome-wide genealogies from genotypes of 337,464 UK Biobank individuals. We perform genealogy-based association analysis of 7 complex traits, detecting more rare and ultra-rare signals (N = 133, frequency range 0.0004% − 0.1%) than genotype imputation from ∼65,000 sequenced haplotypes (N = 65). We validate these signals using exome sequencing data from 138,039 individuals. ARG-Needle associations strongly tag (average r = 0.72) underlying sequencing variants that are enriched for missense (2.3×) and loss-of-function (4.5×) variation. Compared to imputation, inferred genealogies also capture additional signals for higher frequency variants. These results demonstrate that biobank-scale inference of gene genealogies may be leveraged in the analysis of complex traits, complementing approaches that require the availability of large, population-specific sequencing panels.</p>
spellingShingle Genetics--Statistical methods
Zhang, BC
Biobank-scale ancestral recombination graphs: inference and applications to the analysis of complex traits
title Biobank-scale ancestral recombination graphs: inference and applications to the analysis of complex traits
title_full Biobank-scale ancestral recombination graphs: inference and applications to the analysis of complex traits
title_fullStr Biobank-scale ancestral recombination graphs: inference and applications to the analysis of complex traits
title_full_unstemmed Biobank-scale ancestral recombination graphs: inference and applications to the analysis of complex traits
title_short Biobank-scale ancestral recombination graphs: inference and applications to the analysis of complex traits
title_sort biobank scale ancestral recombination graphs inference and applications to the analysis of complex traits
topic Genetics--Statistical methods
work_keys_str_mv AT zhangbc biobankscaleancestralrecombinationgraphsinferenceandapplicationstotheanalysisofcomplextraits