Scalable approaches to inference and analysis of genome-wide genealogies

The genealogical history of a set of genomes is encoded by its ancestral recombination graph (ARG), a graph-like structure that represents ancestral recombination and coalescence events. ARGs have proven a useful tool for a range of tasks in statistical and population genetics, but in many cases the...

Full description

Bibliographic Details
Main Author: Gunnarsson, ÁF
Other Authors: Palamara, P
Format: Thesis
Language:English
Published: 2023
Subjects:
Description
Summary:The genealogical history of a set of genomes is encoded by its ancestral recombination graph (ARG), a graph-like structure that represents ancestral recombination and coalescence events. ARGs have proven a useful tool for a range of tasks in statistical and population genetics, but in many cases the inference of an ARG from observed genotypes presents a strict computational bottleneck. In this text we present Threads, a highly scalable and accurate ARG inference algorithm. We show through extensive simulations that Threads is significantly more computationally efficient than other ARG inference methods, while achieving high accuracy across a range of metrics. Using Threads, we explore new ARG-based applications in genotype compression, phasing, imputation, and association studies, in each case observing unique benefits obtained through the use of ARGs. Together, our results present a strong argument in favor of the ARG as a standard feature of the statistical geneticist's toolkit, and with Threads, efficient and accurate inference of ARGs is made feasible for modern, biobank-scale datasets.