Genomic data analyses for population history and population health
<p>Many of the patterns of genetic variation we observe today have arisen via the complex dynamics of interactions and isolation of historic human populations. In this thesis, we focus on two important features of the genetics of populations that can be used to learn about human history: popul...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Language: | English |
Published: |
2017
|
_version_ | 1797094404983357440 |
---|---|
author | Bycroft, C |
author2 | Donnelly, P |
author_facet | Donnelly, P Bycroft, C |
author_sort | Bycroft, C |
collection | OXFORD |
description | <p>Many of the patterns of genetic variation we observe today have arisen via the complex dynamics of interactions and isolation of historic human populations. In this thesis, we focus on two important features of the genetics of populations that can be used to learn about human history: population structure and admixture. The Iberian peninsula has a complex demographic history, as well as rich linguistic and cultural diversity. However, previous studies using small genomic regions (such as Y-chromosome and mtDNA) as well as genome-wide data have so far detected limited genetic structure in Iberia. Larger datasets and powerful new statistical methods that exploit information in the correlation structure of nearby genetic markers have made it possible to detect and characterise genetic differentiation at fine geographic scales. We performed the largest and most comprehensive study of Spanish population structure to date by analysing genotyping array data for &Tilde;1,400 Spanish individuals genotyped at &Tilde;700,000 polymorphic loci. We show that at broad scales, the major axis of genetic differentiation in Spain runs from west to east, while there is remarkable genetic similarity in the north-south direction. Our analysis also reveals striking patterns of geographically-localised and subtle population structure within Spain at scales down to tens of kilometres. We developed and applied new approaches to show how this structure has arisen from a complex and regionally-varying mix of genetic isolation and recent gene-flow within and from outside of Iberia. To further explore the genetic impact of historical migrations and invasions of Iberia, we assembled a data set of 2,920 individuals (&Tilde;300,000 markers) from Iberia and the surrounding regions of north Africa, Europe, and sub-Saharan Africa. Our admixture analysis implies that north African-like DNA in Iberia was mainly introduced in the earlier half (860 - 1120 CE) of the period of Muslim rule in Iberia, and we estimate that the closest modern-day equivalents to the initial migrants are located in Western Sahara. We also find that north African-like DNA in Iberia shows striking regional variation, with near-zero contributions in the Basque regions, low amounts (&Tilde;3%) in the north east of Iberia, and as high as (&Tilde;11%) in Galicia and Portugal.</p> <p>The UK Biobank project is a large prospective cohort study of &Tilde;500,000 individuals from across the United Kingdom, aged between 40-69 at recruitment. A rich variety of phenotypic and health-related information is available on each participant, making the resource unprecedented in its size and scope. Understanding the role that genetics plays in phenotypic variation, and its potential interactions with other factors, provides a critical route to a better understanding of human biology and population health. As such, a key component of the UK Biobank resource has been the collection of genome-wide genetic data (&Tilde;805,000 markers) on every participant using purpose-designed genotyping arrays. These data are the focus of the second part of this thesis. In particular, we designed and implemented a quality control (QC) pipeline on behalf of the current and future use of this multi-purpose resource. Genotype data on this scale offers novel opportunities for assessing quality issues, although the wide range of ancestral backgrounds in the cohort also creates particular challenges. We also conducted a set of analyses that reveal properties of the genetic data, including population structure and familial relatedness, that can be important for downstream analyses. We find that cryptic relatedness is common among UK Biobank participants (&Tilde;30% have at least one first cousin relative or closer), and a full range of human population structure is present in this cohort: from world-wide ancestral diversity to subtle population structure at sub-national geographic scales. Finally, we performed a genome-wide association scan on a well-studied and highly polygenic phenotype: standing height. This provided a further test of the effectiveness of our QC, as well as highlighting the potential of the resource to uncover novel regions of association.</p> |
first_indexed | 2024-03-07T04:13:38Z |
format | Thesis |
id | oxford-uuid:c8a76d94-ded6-4a16-b5af-09bbad6292a2 |
institution | University of Oxford |
language | English |
last_indexed | 2024-03-07T04:13:38Z |
publishDate | 2017 |
record_format | dspace |
spelling | oxford-uuid:c8a76d94-ded6-4a16-b5af-09bbad6292a22022-03-27T06:53:43ZGenomic data analyses for population history and population healthThesishttp://purl.org/coar/resource_type/c_db06uuid:c8a76d94-ded6-4a16-b5af-09bbad6292a2EnglishORA Deposit2017Bycroft, CDonnelly, PMyers, S<p>Many of the patterns of genetic variation we observe today have arisen via the complex dynamics of interactions and isolation of historic human populations. In this thesis, we focus on two important features of the genetics of populations that can be used to learn about human history: population structure and admixture. The Iberian peninsula has a complex demographic history, as well as rich linguistic and cultural diversity. However, previous studies using small genomic regions (such as Y-chromosome and mtDNA) as well as genome-wide data have so far detected limited genetic structure in Iberia. Larger datasets and powerful new statistical methods that exploit information in the correlation structure of nearby genetic markers have made it possible to detect and characterise genetic differentiation at fine geographic scales. We performed the largest and most comprehensive study of Spanish population structure to date by analysing genotyping array data for &Tilde;1,400 Spanish individuals genotyped at &Tilde;700,000 polymorphic loci. We show that at broad scales, the major axis of genetic differentiation in Spain runs from west to east, while there is remarkable genetic similarity in the north-south direction. Our analysis also reveals striking patterns of geographically-localised and subtle population structure within Spain at scales down to tens of kilometres. We developed and applied new approaches to show how this structure has arisen from a complex and regionally-varying mix of genetic isolation and recent gene-flow within and from outside of Iberia. To further explore the genetic impact of historical migrations and invasions of Iberia, we assembled a data set of 2,920 individuals (&Tilde;300,000 markers) from Iberia and the surrounding regions of north Africa, Europe, and sub-Saharan Africa. Our admixture analysis implies that north African-like DNA in Iberia was mainly introduced in the earlier half (860 - 1120 CE) of the period of Muslim rule in Iberia, and we estimate that the closest modern-day equivalents to the initial migrants are located in Western Sahara. We also find that north African-like DNA in Iberia shows striking regional variation, with near-zero contributions in the Basque regions, low amounts (&Tilde;3%) in the north east of Iberia, and as high as (&Tilde;11%) in Galicia and Portugal.</p> <p>The UK Biobank project is a large prospective cohort study of &Tilde;500,000 individuals from across the United Kingdom, aged between 40-69 at recruitment. A rich variety of phenotypic and health-related information is available on each participant, making the resource unprecedented in its size and scope. Understanding the role that genetics plays in phenotypic variation, and its potential interactions with other factors, provides a critical route to a better understanding of human biology and population health. As such, a key component of the UK Biobank resource has been the collection of genome-wide genetic data (&Tilde;805,000 markers) on every participant using purpose-designed genotyping arrays. These data are the focus of the second part of this thesis. In particular, we designed and implemented a quality control (QC) pipeline on behalf of the current and future use of this multi-purpose resource. Genotype data on this scale offers novel opportunities for assessing quality issues, although the wide range of ancestral backgrounds in the cohort also creates particular challenges. We also conducted a set of analyses that reveal properties of the genetic data, including population structure and familial relatedness, that can be important for downstream analyses. We find that cryptic relatedness is common among UK Biobank participants (&Tilde;30% have at least one first cousin relative or closer), and a full range of human population structure is present in this cohort: from world-wide ancestral diversity to subtle population structure at sub-national geographic scales. Finally, we performed a genome-wide association scan on a well-studied and highly polygenic phenotype: standing height. This provided a further test of the effectiveness of our QC, as well as highlighting the potential of the resource to uncover novel regions of association.</p> |
spellingShingle | Bycroft, C Genomic data analyses for population history and population health |
title | Genomic data analyses for population history and population health |
title_full | Genomic data analyses for population history and population health |
title_fullStr | Genomic data analyses for population history and population health |
title_full_unstemmed | Genomic data analyses for population history and population health |
title_short | Genomic data analyses for population history and population health |
title_sort | genomic data analyses for population history and population health |
work_keys_str_mv | AT bycroftc genomicdataanalysesforpopulationhistoryandpopulationhealth |