Summary: | The genealogical history of a set of genomes is encoded by its ancestral recombination graph (ARG), a graph-like structure that represents ancestral recombination and coalescence events. ARGs have proven a useful tool for a range of tasks in statistical and population genetics, but in many cases the inference of an ARG from observed genotypes presents a strict computational bottleneck. In this text we present Threads, a highly scalable and accurate ARG inference algorithm. We show through extensive simulations that Threads is significantly more computationally efficient than other ARG inference methods, while achieving high accuracy across a range of metrics. Using Threads, we explore new ARG-based applications in genotype compression, phasing, imputation, and association studies, in each case observing unique benefits obtained through the use of ARGs. Together, our results present a strong argument in favor of the ARG as a standard feature of the statistical geneticist's toolkit, and with Threads, efficient and accurate inference of ARGs is made feasible for modern, biobank-scale datasets.
|