Genealogy estimation for thousands of samples

<p>A key and fundamental concept that captures our shared genetic history is the genealogy, which traces the genetic relationships of present-day individuals to their most-recent common ancestors. Knowledge of the genealogy would, in principle, capture all evolutionary forces that modified the...

Full description

Bibliographic Details
Main Author: Speidel, L
Other Authors: Myers, S
Format: Thesis
Language:English
Published: 2019
Subjects:
_version_ 1797071928314298368
author Speidel, L
author2 Myers, S
author_facet Myers, S
Speidel, L
author_sort Speidel, L
collection OXFORD
description <p>A key and fundamental concept that captures our shared genetic history is the genealogy, which traces the genetic relationships of present-day individuals to their most-recent common ancestors. Knowledge of the genealogy would, in principle, capture all evolutionary forces that modified the genetic material ancestral to our DNA, and would hence simplify - and enhance - many inference problems about past demography and evolution. Despite their importance, estimation of genealogies has remained unsolved even for moderately sized data sets, with existing methods unable to handle sample sizes beyond a few hundred samples, yet modern data sets often exceed tens of thousands of samples.</p> <p>In this thesis, I present a method, Relate, that estimates such genealogies for thousands of samples. I demonstrate on a variety of population genetic applications that Relate-based inferences improve in accuracy, resolution, or statistical power on state-of-the-art alternatives. I then reconstruct the genealogy of 2478 humans from 26 populations. I infer historical population sizes and population split times with higher resolution than previously possible and identify highly diverged lineages, reflecting Neanderthal and Denisovan introgression in non-Africans, and unknown events in Africans. I report regions that show evidence of being under strong positive selection that were previously unreported and identify multi-allelic traits likely to be under selection. I additionally apply Relate to 50 wild mice sampled in France, India, and Taiwan and demonstrate that the estimated genealogies contain rich information about their demographic history, mutation rate trends consistent with GC biased gene conversion, as well as strong indications of selective sweeps in each population.</p>
first_indexed 2024-03-06T23:00:20Z
format Thesis
id oxford-uuid:61e3f8d0-6911-461d-92ea-ee91559cf353
institution University of Oxford
language English
last_indexed 2024-03-06T23:00:20Z
publishDate 2019
record_format dspace
spelling oxford-uuid:61e3f8d0-6911-461d-92ea-ee91559cf3532022-03-26T18:02:47ZGenealogy estimation for thousands of samplesThesishttp://purl.org/coar/resource_type/c_db06uuid:61e3f8d0-6911-461d-92ea-ee91559cf353StatisticsGeneticsEnglishORA Deposit2019Speidel, LMyers, S<p>A key and fundamental concept that captures our shared genetic history is the genealogy, which traces the genetic relationships of present-day individuals to their most-recent common ancestors. Knowledge of the genealogy would, in principle, capture all evolutionary forces that modified the genetic material ancestral to our DNA, and would hence simplify - and enhance - many inference problems about past demography and evolution. Despite their importance, estimation of genealogies has remained unsolved even for moderately sized data sets, with existing methods unable to handle sample sizes beyond a few hundred samples, yet modern data sets often exceed tens of thousands of samples.</p> <p>In this thesis, I present a method, Relate, that estimates such genealogies for thousands of samples. I demonstrate on a variety of population genetic applications that Relate-based inferences improve in accuracy, resolution, or statistical power on state-of-the-art alternatives. I then reconstruct the genealogy of 2478 humans from 26 populations. I infer historical population sizes and population split times with higher resolution than previously possible and identify highly diverged lineages, reflecting Neanderthal and Denisovan introgression in non-Africans, and unknown events in Africans. I report regions that show evidence of being under strong positive selection that were previously unreported and identify multi-allelic traits likely to be under selection. I additionally apply Relate to 50 wild mice sampled in France, India, and Taiwan and demonstrate that the estimated genealogies contain rich information about their demographic history, mutation rate trends consistent with GC biased gene conversion, as well as strong indications of selective sweeps in each population.</p>
spellingShingle Statistics
Genetics
Speidel, L
Genealogy estimation for thousands of samples
title Genealogy estimation for thousands of samples
title_full Genealogy estimation for thousands of samples
title_fullStr Genealogy estimation for thousands of samples
title_full_unstemmed Genealogy estimation for thousands of samples
title_short Genealogy estimation for thousands of samples
title_sort genealogy estimation for thousands of samples
topic Statistics
Genetics
work_keys_str_mv AT speidell genealogyestimationforthousandsofsamples