Inferring the fine-scale structure and evolution of recombination from high-throughput genome sequencing

<p>In eukaryotes, recombination plays a critical role in both the production of viable gametes and as a population genetic process. Here, we are interested in studying recombination as it provides insight into a process that has shaped variation. To this end, we study the evolution of cross-ov...

Full description

Bibliographic Details
Main Author: Venn, O
Other Authors: McVean, G
Format: Thesis
Language:English
Published: 2013
Subjects:
_version_ 1797090578968608768
author Venn, O
author2 McVean, G
author_facet McVean, G
Venn, O
author_sort Venn, O
collection OXFORD
description <p>In eukaryotes, recombination plays a critical role in both the production of viable gametes and as a population genetic process. Here, we are interested in studying recombination as it provides insight into a process that has shaped variation. To this end, we study the evolution of cross-over rates in chimpanzees and humans through two experiments.</p> <p>Components of the recombination machinery are well described in yeast and C. elegans, but less so in other species. In humans, cross-over rates vary across physical scales and occur predominantly in narrow ∼2 kb regions called hotspots, where hotspot usage differs considerably between individuals. Differential hotspot usage is associated with specific DNA motifs, and DNA-contacting zinc finger array variants in the transacting PRDM9 H3K4 trimethyltransferase. The precise relationship between DNA motifs, PRDM9 and hotspot activity is not completely understood.</p> <p>Experiment 1. To investigate the importance of PRDM9 motif recognition, which is predicted be different between humans and chimpanzees, and the effect of PRDM9 on the evolution of fine-scale cross-over rates, we sequenced 10 unrelated Pan troglodytes verus (Western chimpanzee) genomes to moderate coverage (∼10×). I validate the approach by demonstrating that fine-scale maps estimated from 10 human genomes of each African and European ancestry recapitulate independently estimated maps. Then I characterise the error modes in sequencing data arising from errors in chemistry, alignment, variant calling, and genotyping. I identify several cryptic error modes missed by state-of-the-art filters and develop methods to counteract them.</p> <p>To guard against genotype error arising from stochastic variation in low to moderate coverage sequencing, I develop methods to incorporate the underlying statistical uncertainty into recombination analyses, evaluate the approaches through simulation (estimated 11% improvement) and empirical assessment (estimated 4% improvement), and discover that the reported genotype uncertainty is poorly calibrated, which limits the approaches. Consequently, a filtering approach was applied to the hard-called chimpanzee genotypes. I estimate recombination rates in chimpanzees through an existing LD-based method. In contrast to humans, there is no increased cross-over localisation around chimpanzee PRDM9 binding predictions, nor motifs consistently associated with activity. Hotspots do not overlap between the two species, indicating that rates evolved rapidly and consistent with PRDM9 localising all hotspots. In contrast, gene pro- moters and CpG islands are common attractors of recombination (2.7-fold increase in rate in chimpanzee, 1.5-fold increase in human), suggesting chromatin state influences hotspot placement but to varying degree in the species. I discuss the potential implications for PRDM9 mechanism.</p> <p>Experiment 2. To enable a more representative characterisation of the spectrum of genome changes occurring in chimpanzee genomes, I analyse data from an extended three generation Western chimpanzee pedigree sequenced at high coverage (∼30×). I use Mendel transmission to filter variants, infer haplotypes, and identify recombination events through a Hidden Markov Model approach. We detect 375 recombination events, of which 3 are double cross-over events. Sex-specific recombination rate estimates in chimpanzees mirror sex differences in humans (N♂/N♀ = 0.58) and have similar levels of total recombination. We resolve recombination events typically at ∼ 856 base-pair resolution. Additionally, analyses of Mendel inconsistencies suggest that extended pedigree sequencing opens the door on studying complex genome changes.</p> <p>These experiments demonstrate the power of comparative analyses, the utility of high throughput sequencing in enabling the study of recombination in almost any species of interest, the challenges in sifting signal from noise in these data, and the need for experimental and algorithmic methods to guard against error.</p>
first_indexed 2024-03-07T03:20:40Z
format Thesis
id oxford-uuid:b74f6706-a37d-4d71-975d-02e0f79ccdf1
institution University of Oxford
language English
last_indexed 2024-03-07T03:20:40Z
publishDate 2013
record_format dspace
spelling oxford-uuid:b74f6706-a37d-4d71-975d-02e0f79ccdf12022-03-27T04:47:38ZInferring the fine-scale structure and evolution of recombination from high-throughput genome sequencingThesishttp://purl.org/coar/resource_type/c_db06uuid:b74f6706-a37d-4d71-975d-02e0f79ccdf1Mathematical genetics and bioinformatics (statistics)Medical SciencesEnglishOxford University Research Archive - Valet2013Venn, OMcVean, G<p>In eukaryotes, recombination plays a critical role in both the production of viable gametes and as a population genetic process. Here, we are interested in studying recombination as it provides insight into a process that has shaped variation. To this end, we study the evolution of cross-over rates in chimpanzees and humans through two experiments.</p> <p>Components of the recombination machinery are well described in yeast and C. elegans, but less so in other species. In humans, cross-over rates vary across physical scales and occur predominantly in narrow ∼2 kb regions called hotspots, where hotspot usage differs considerably between individuals. Differential hotspot usage is associated with specific DNA motifs, and DNA-contacting zinc finger array variants in the transacting PRDM9 H3K4 trimethyltransferase. The precise relationship between DNA motifs, PRDM9 and hotspot activity is not completely understood.</p> <p>Experiment 1. To investigate the importance of PRDM9 motif recognition, which is predicted be different between humans and chimpanzees, and the effect of PRDM9 on the evolution of fine-scale cross-over rates, we sequenced 10 unrelated Pan troglodytes verus (Western chimpanzee) genomes to moderate coverage (∼10×). I validate the approach by demonstrating that fine-scale maps estimated from 10 human genomes of each African and European ancestry recapitulate independently estimated maps. Then I characterise the error modes in sequencing data arising from errors in chemistry, alignment, variant calling, and genotyping. I identify several cryptic error modes missed by state-of-the-art filters and develop methods to counteract them.</p> <p>To guard against genotype error arising from stochastic variation in low to moderate coverage sequencing, I develop methods to incorporate the underlying statistical uncertainty into recombination analyses, evaluate the approaches through simulation (estimated 11% improvement) and empirical assessment (estimated 4% improvement), and discover that the reported genotype uncertainty is poorly calibrated, which limits the approaches. Consequently, a filtering approach was applied to the hard-called chimpanzee genotypes. I estimate recombination rates in chimpanzees through an existing LD-based method. In contrast to humans, there is no increased cross-over localisation around chimpanzee PRDM9 binding predictions, nor motifs consistently associated with activity. Hotspots do not overlap between the two species, indicating that rates evolved rapidly and consistent with PRDM9 localising all hotspots. In contrast, gene pro- moters and CpG islands are common attractors of recombination (2.7-fold increase in rate in chimpanzee, 1.5-fold increase in human), suggesting chromatin state influences hotspot placement but to varying degree in the species. I discuss the potential implications for PRDM9 mechanism.</p> <p>Experiment 2. To enable a more representative characterisation of the spectrum of genome changes occurring in chimpanzee genomes, I analyse data from an extended three generation Western chimpanzee pedigree sequenced at high coverage (∼30×). I use Mendel transmission to filter variants, infer haplotypes, and identify recombination events through a Hidden Markov Model approach. We detect 375 recombination events, of which 3 are double cross-over events. Sex-specific recombination rate estimates in chimpanzees mirror sex differences in humans (N♂/N♀ = 0.58) and have similar levels of total recombination. We resolve recombination events typically at ∼ 856 base-pair resolution. Additionally, analyses of Mendel inconsistencies suggest that extended pedigree sequencing opens the door on studying complex genome changes.</p> <p>These experiments demonstrate the power of comparative analyses, the utility of high throughput sequencing in enabling the study of recombination in almost any species of interest, the challenges in sifting signal from noise in these data, and the need for experimental and algorithmic methods to guard against error.</p>
spellingShingle Mathematical genetics and bioinformatics (statistics)
Medical Sciences
Venn, O
Inferring the fine-scale structure and evolution of recombination from high-throughput genome sequencing
title Inferring the fine-scale structure and evolution of recombination from high-throughput genome sequencing
title_full Inferring the fine-scale structure and evolution of recombination from high-throughput genome sequencing
title_fullStr Inferring the fine-scale structure and evolution of recombination from high-throughput genome sequencing
title_full_unstemmed Inferring the fine-scale structure and evolution of recombination from high-throughput genome sequencing
title_short Inferring the fine-scale structure and evolution of recombination from high-throughput genome sequencing
title_sort inferring the fine scale structure and evolution of recombination from high throughput genome sequencing
topic Mathematical genetics and bioinformatics (statistics)
Medical Sciences
work_keys_str_mv AT venno inferringthefinescalestructureandevolutionofrecombinationfromhighthroughputgenomesequencing