Patterns of selective constraint across the mouse and human genomes

<p>Understanding how genetic variation affects organism phenotype and fitness is a fundamental question in biology with implications for biomedical research. Constrained genomic loci (i.e. loci under selective constraint) are depleted of variation due to negative selection and are inferred to...

Full description

Bibliographic Details
Main Author: Powell, G
Other Authors: Lindgren, C
Format: Thesis
Language:English
Published: 2022
Description
Summary:<p>Understanding how genetic variation affects organism phenotype and fitness is a fundamental question in biology with implications for biomedical research. Constrained genomic loci (i.e. loci under selective constraint) are depleted of variation due to negative selection and are inferred to be functionally important to organism fitness. Recent sequencing in human populations has allowed researchers to make contemporary estimates of selective constraint across the genome. These estimates provide a basis for identifying functionally important elements and interpreting the fitness consequences of mutations. Mouse models are critical to our understanding of mammalian biology and are used to experimentally assess the function of genomic elements and genetic variation in-vivo. However, how selective constraint shapes patterns of variation across the mouse genome remains relatively understudied.</p> <p>In this thesis, I identify and characterise constrained genes and genomic regions in mice using single nucleotide variation from sequencing in wild populations (Mus musculus and Mus spretus). I quantify how local sequence context affects substitution probabilities across the mouse genome in order to control for differences in mutability when estimating selective constraint. I show a nucleotide’s 7-mer context (i.e. the three bases flanking the nucleotide on each side) explains more variance in substitution rates than either the 3-mer or 5-mer contexts. I then use these estimates of sequence mutability to identify constrained genes and genomic regions. The results show that many highly constrained genes are essential for embryonic development, and that mouse gene constraint is positively correlated with pleiotropy (i.e. a greater degree of constraint is correlated with an increased number of phenotype associations). Importantly, I also observe that the most constrained genes and genomic regions are enriched for variant sites associated with human disease, suggesting pathogenicity may have been conserved between the lineages.</p> <p>I conclude by discussing how these results may help to interpret the functional importance of genomic elements, prioritise disease-associated loci for functional assessment, and their importance for facilitating more reliable inferences from mouse models.</p>