Inferring the fine-scale structure and evolution of recombination from high-throughput genome sequencing

<p>In eukaryotes, recombination plays a critical role in both the production of viable gametes and as a population genetic process. Here, we are interested in studying recombination as it provides insight into a process that has shaped variation. To this end, we study the evolution of cross-ov...

Full description

Bibliographic Details
Main Author: Venn, O
Other Authors: McVean, G
Format: Thesis
Language:English
Published: 2013
Subjects:
Description
Summary:<p>In eukaryotes, recombination plays a critical role in both the production of viable gametes and as a population genetic process. Here, we are interested in studying recombination as it provides insight into a process that has shaped variation. To this end, we study the evolution of cross-over rates in chimpanzees and humans through two experiments.</p> <p>Components of the recombination machinery are well described in yeast and C. elegans, but less so in other species. In humans, cross-over rates vary across physical scales and occur predominantly in narrow ∼2 kb regions called hotspots, where hotspot usage differs considerably between individuals. Differential hotspot usage is associated with specific DNA motifs, and DNA-contacting zinc finger array variants in the transacting PRDM9 H3K4 trimethyltransferase. The precise relationship between DNA motifs, PRDM9 and hotspot activity is not completely understood.</p> <p>Experiment 1. To investigate the importance of PRDM9 motif recognition, which is predicted be different between humans and chimpanzees, and the effect of PRDM9 on the evolution of fine-scale cross-over rates, we sequenced 10 unrelated Pan troglodytes verus (Western chimpanzee) genomes to moderate coverage (∼10×). I validate the approach by demonstrating that fine-scale maps estimated from 10 human genomes of each African and European ancestry recapitulate independently estimated maps. Then I characterise the error modes in sequencing data arising from errors in chemistry, alignment, variant calling, and genotyping. I identify several cryptic error modes missed by state-of-the-art filters and develop methods to counteract them.</p> <p>To guard against genotype error arising from stochastic variation in low to moderate coverage sequencing, I develop methods to incorporate the underlying statistical uncertainty into recombination analyses, evaluate the approaches through simulation (estimated 11% improvement) and empirical assessment (estimated 4% improvement), and discover that the reported genotype uncertainty is poorly calibrated, which limits the approaches. Consequently, a filtering approach was applied to the hard-called chimpanzee genotypes. I estimate recombination rates in chimpanzees through an existing LD-based method. In contrast to humans, there is no increased cross-over localisation around chimpanzee PRDM9 binding predictions, nor motifs consistently associated with activity. Hotspots do not overlap between the two species, indicating that rates evolved rapidly and consistent with PRDM9 localising all hotspots. In contrast, gene pro- moters and CpG islands are common attractors of recombination (2.7-fold increase in rate in chimpanzee, 1.5-fold increase in human), suggesting chromatin state influences hotspot placement but to varying degree in the species. I discuss the potential implications for PRDM9 mechanism.</p> <p>Experiment 2. To enable a more representative characterisation of the spectrum of genome changes occurring in chimpanzee genomes, I analyse data from an extended three generation Western chimpanzee pedigree sequenced at high coverage (∼30×). I use Mendel transmission to filter variants, infer haplotypes, and identify recombination events through a Hidden Markov Model approach. We detect 375 recombination events, of which 3 are double cross-over events. Sex-specific recombination rate estimates in chimpanzees mirror sex differences in humans (N♂/N♀ = 0.58) and have similar levels of total recombination. We resolve recombination events typically at ∼ 856 base-pair resolution. Additionally, analyses of Mendel inconsistencies suggest that extended pedigree sequencing opens the door on studying complex genome changes.</p> <p>These experiments demonstrate the power of comparative analyses, the utility of high throughput sequencing in enabling the study of recombination in almost any species of interest, the challenges in sifting signal from noise in these data, and the need for experimental and algorithmic methods to guard against error.</p>