Summary: | <p>Exploring the human genome holds benefits for many fields, which include molecular medicine, human evolution, population genetics and genetic epidemiology. The human genome contains around 3 billion base pairs and approximately 99.9 percent of these base pairs are identical between any two individuals. Differences in the remaining 0.1 percent could explain the astonishing variety between individuals - not only in appearance, but also in health.</p>
<p>Building upon this observation, large-scale genomic collections, through efforts like the NIH All of Us research program or the UK BioBank, have yielded data sets of hundreds of thousands of individuals and are expected to grow to millions in the com- ing years. Leveraging such data sets to understand disease and phenotypic variation requires understanding the fine-scale genetic relationships between individuals.</p>
<p>The work presented in this thesis is spread across three main chapters including two published articles and one preprint, and they are summarised as:</p>
<p><em>Chapter 2</em> - Nait Saada et al.(2020) focuses on detecting genetic relationships using“identical- by-descent” genomic regions that are inherited from a common ancestor between purportedly “unrelated” pairs of individuals in a data set. IBD segments originating within the past 1,500 years are inferred between 487,409 UK Biobank individuals, revealing fine scale population structure, signals of recent selection and ultra-rare variant associations.</p>
<p><em>Chapter 3</em> - Nait Saada et al. (2021) explores the use of deep neural networks for accurately predicting the time to the most recent common ancestor (TMRCA), or coalescence time, between individuals at a specific genomic location. Pairwise TMRCAs can be leveraged to estimate the age of observed genomic variants in a data set.</p>
<p><em>Chapter 4</em> - Nait Saada et al. (2022) infers ages of ~80 million mutations across 26 diverse populations in the 1,000 Genomes Project data set using deep learning. By leveraging allele age estimates, signatures of selection for individual variants and complex traits are identified.</p>
|