Biobank-scale ancestral recombination graphs: inference and applications to the analysis of complex traits
<p>Across living species, DNA is transmitted from generation to generation via the processes of inheritance, mutation, and recombination. The history of these processes can be recorded using genome-wide gene genealogies. Accurate inference of gene genealogies from genetic data has the potentia...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Language: | English |
Published: |
2022
|
Subjects: |
_version_ | 1826310120471003136 |
---|---|
author | Zhang, BC |
author2 | Palamara, P |
author_facet | Palamara, P Zhang, BC |
author_sort | Zhang, BC |
collection | OXFORD |
description | <p>Across living species, DNA is transmitted from generation to generation via the processes of inheritance, mutation, and recombination. The history of these processes can be recorded using genome-wide gene genealogies. Accurate inference of gene genealogies from genetic data has the potential to facilitate a wide range of analyses, but is computationally challenging. In this thesis, we introduce a scalable method, called ARG-Needle, that uses genotype hashing and a coalescent hidden Markov model to infer genome-wide genealogies from sequencing or genotyping array data in modern biobanks. We develop strategies that utilise the inferred genome-wide genealogies within linear mixed models to perform association and other analyses of biomedical traits.</p>
<p>We validate the accuracy and scalability of ARG-Needle through extensive coalescent simulations, and use ARG-Needle to build genome-wide genealogies from genotypes of 337,464 UK Biobank individuals. We perform genealogy-based association analysis of 7 complex traits, detecting more rare and ultra-rare signals (N = 133, frequency range 0.0004% − 0.1%) than genotype imputation from ∼65,000 sequenced haplotypes (N = 65). We validate these signals using exome sequencing data from 138,039 individuals. ARG-Needle associations strongly tag (average r = 0.72) underlying sequencing variants that are enriched for missense (2.3×) and loss-of-function (4.5×) variation. Compared to imputation, inferred genealogies also capture additional signals for higher frequency variants. These results demonstrate that biobank-scale inference of gene genealogies may be leveraged in the analysis of complex traits, complementing approaches that require the availability of large, population-specific sequencing panels.</p> |
first_indexed | 2024-03-07T07:45:56Z |
format | Thesis |
id | oxford-uuid:c4e8fb87-618b-4ca1-ad04-16f415ce5021 |
institution | University of Oxford |
language | English |
last_indexed | 2024-03-07T07:45:56Z |
publishDate | 2022 |
record_format | dspace |
spelling | oxford-uuid:c4e8fb87-618b-4ca1-ad04-16f415ce50212023-06-05T14:15:27ZBiobank-scale ancestral recombination graphs: inference and applications to the analysis of complex traitsThesishttp://purl.org/coar/resource_type/c_db06uuid:c4e8fb87-618b-4ca1-ad04-16f415ce5021Genetics--Statistical methodsEnglishHyrax Deposit2022Zhang, BCPalamara, P<p>Across living species, DNA is transmitted from generation to generation via the processes of inheritance, mutation, and recombination. The history of these processes can be recorded using genome-wide gene genealogies. Accurate inference of gene genealogies from genetic data has the potential to facilitate a wide range of analyses, but is computationally challenging. In this thesis, we introduce a scalable method, called ARG-Needle, that uses genotype hashing and a coalescent hidden Markov model to infer genome-wide genealogies from sequencing or genotyping array data in modern biobanks. We develop strategies that utilise the inferred genome-wide genealogies within linear mixed models to perform association and other analyses of biomedical traits.</p> <p>We validate the accuracy and scalability of ARG-Needle through extensive coalescent simulations, and use ARG-Needle to build genome-wide genealogies from genotypes of 337,464 UK Biobank individuals. We perform genealogy-based association analysis of 7 complex traits, detecting more rare and ultra-rare signals (N = 133, frequency range 0.0004% − 0.1%) than genotype imputation from ∼65,000 sequenced haplotypes (N = 65). We validate these signals using exome sequencing data from 138,039 individuals. ARG-Needle associations strongly tag (average r = 0.72) underlying sequencing variants that are enriched for missense (2.3×) and loss-of-function (4.5×) variation. Compared to imputation, inferred genealogies also capture additional signals for higher frequency variants. These results demonstrate that biobank-scale inference of gene genealogies may be leveraged in the analysis of complex traits, complementing approaches that require the availability of large, population-specific sequencing panels.</p> |
spellingShingle | Genetics--Statistical methods Zhang, BC Biobank-scale ancestral recombination graphs: inference and applications to the analysis of complex traits |
title | Biobank-scale ancestral recombination graphs: inference and applications to the analysis of complex traits |
title_full | Biobank-scale ancestral recombination graphs: inference and applications to the analysis of complex traits |
title_fullStr | Biobank-scale ancestral recombination graphs: inference and applications to the analysis of complex traits |
title_full_unstemmed | Biobank-scale ancestral recombination graphs: inference and applications to the analysis of complex traits |
title_short | Biobank-scale ancestral recombination graphs: inference and applications to the analysis of complex traits |
title_sort | biobank scale ancestral recombination graphs inference and applications to the analysis of complex traits |
topic | Genetics--Statistical methods |
work_keys_str_mv | AT zhangbc biobankscaleancestralrecombinationgraphsinferenceandapplicationstotheanalysisofcomplextraits |