Genealogy estimation for thousands of samples
<p>A key and fundamental concept that captures our shared genetic history is the genealogy, which traces the genetic relationships of present-day individuals to their most-recent common ancestors. Knowledge of the genealogy would, in principle, capture all evolutionary forces that modified the...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Language: | English |
Published: |
2019
|
Subjects: |
_version_ | 1797071928314298368 |
---|---|
author | Speidel, L |
author2 | Myers, S |
author_facet | Myers, S Speidel, L |
author_sort | Speidel, L |
collection | OXFORD |
description | <p>A key and fundamental concept that captures our shared genetic history is the genealogy, which traces the genetic relationships of present-day individuals to their most-recent common ancestors. Knowledge of the genealogy would, in principle, capture all evolutionary forces that modified the genetic material ancestral to our DNA, and would hence simplify - and enhance - many inference problems about past demography and evolution. Despite their importance, estimation of genealogies has remained unsolved even for moderately sized data sets, with existing methods unable to handle sample sizes beyond a few hundred samples, yet modern data sets often exceed tens of thousands of samples.</p>
<p>In this thesis, I present a method, Relate, that estimates such genealogies for thousands of samples. I demonstrate on a variety of population genetic applications that Relate-based inferences improve in accuracy, resolution, or statistical power on state-of-the-art alternatives. I then reconstruct the genealogy of 2478 humans from 26 populations. I infer historical population sizes and population split times with higher resolution than previously possible and identify highly diverged lineages, reflecting Neanderthal and Denisovan introgression in non-Africans, and unknown events in Africans. I report regions that show evidence of being under strong positive selection that were previously unreported and identify multi-allelic traits likely to be under selection. I additionally apply Relate to 50 wild mice sampled in France, India, and Taiwan and demonstrate that the estimated genealogies contain rich information about their demographic history, mutation rate trends consistent with GC biased gene conversion, as well as strong indications of selective sweeps in each population.</p> |
first_indexed | 2024-03-06T23:00:20Z |
format | Thesis |
id | oxford-uuid:61e3f8d0-6911-461d-92ea-ee91559cf353 |
institution | University of Oxford |
language | English |
last_indexed | 2024-03-06T23:00:20Z |
publishDate | 2019 |
record_format | dspace |
spelling | oxford-uuid:61e3f8d0-6911-461d-92ea-ee91559cf3532022-03-26T18:02:47ZGenealogy estimation for thousands of samplesThesishttp://purl.org/coar/resource_type/c_db06uuid:61e3f8d0-6911-461d-92ea-ee91559cf353StatisticsGeneticsEnglishORA Deposit2019Speidel, LMyers, S<p>A key and fundamental concept that captures our shared genetic history is the genealogy, which traces the genetic relationships of present-day individuals to their most-recent common ancestors. Knowledge of the genealogy would, in principle, capture all evolutionary forces that modified the genetic material ancestral to our DNA, and would hence simplify - and enhance - many inference problems about past demography and evolution. Despite their importance, estimation of genealogies has remained unsolved even for moderately sized data sets, with existing methods unable to handle sample sizes beyond a few hundred samples, yet modern data sets often exceed tens of thousands of samples.</p> <p>In this thesis, I present a method, Relate, that estimates such genealogies for thousands of samples. I demonstrate on a variety of population genetic applications that Relate-based inferences improve in accuracy, resolution, or statistical power on state-of-the-art alternatives. I then reconstruct the genealogy of 2478 humans from 26 populations. I infer historical population sizes and population split times with higher resolution than previously possible and identify highly diverged lineages, reflecting Neanderthal and Denisovan introgression in non-Africans, and unknown events in Africans. I report regions that show evidence of being under strong positive selection that were previously unreported and identify multi-allelic traits likely to be under selection. I additionally apply Relate to 50 wild mice sampled in France, India, and Taiwan and demonstrate that the estimated genealogies contain rich information about their demographic history, mutation rate trends consistent with GC biased gene conversion, as well as strong indications of selective sweeps in each population.</p> |
spellingShingle | Statistics Genetics Speidel, L Genealogy estimation for thousands of samples |
title | Genealogy estimation for thousands of samples |
title_full | Genealogy estimation for thousands of samples |
title_fullStr | Genealogy estimation for thousands of samples |
title_full_unstemmed | Genealogy estimation for thousands of samples |
title_short | Genealogy estimation for thousands of samples |
title_sort | genealogy estimation for thousands of samples |
topic | Statistics Genetics |
work_keys_str_mv | AT speidell genealogyestimationforthousandsofsamples |