Summary: | <p>With the advent of Genome Wide Association Studies (GWAS), researchers have gained unprecedented access to data in order to examine associations between variants and complex traits and diseases. To analyze these studies, researchers have typically utilized measurements of Linkage Disequilibrium (LD) from the GWAS population or from a reference population which may be genetically drifted relative to the GWAS. However, these measurements are only point estimates and thus, do not capture the stochastic nature of forces such as recombination and genetic drift. Moreover, point estimates from reference populations do not take into consideration errors and inconsistencies in the ancestral composition between the reference and GWAS population. Therefore, in this thesis, we instead model LD as a distribution and demonstrate the extent to which statistical genetic applications can be improved in terms of their robustness and calibration by modelling uncertainty in LD.</p>
<p>First, we characterise the distribution of LD and demonstrate that genetic drift between populations can result in a high degree of variation of the measurements of LD. Additionally, we demonstrate that this variation leads to variation downstream within the accuracy of statistical genetic applications including summary statistic imputation and fine-mapping.</p>
<p>Secondly, to generate LD measurements in a genetically drifted population, we use a reference population and weight the individual reference haplotypes. We demonstrate how repeatedly sampling these weights provides a process to obtain the distribution of LD in the drifted population and how the weights can be updated in a marginal fashion.</p>
<p>Thirdly, using GWAS summary statistics and a reference population, we derive a Maximum Likelihood Estimate (MLE) which infers both the level of noise in the GWAS population as well as the drift between the GWAS and the reference population.</p>
<p>Lastly, we show that by repeatedly updating the weights on our reference haplotypes while incorporating GWAS summary statistic data leads to a method to infer the in-sample distributions of LD from a GWAS. The benefits of our approach over using LD obtained from a reference population is illustrated in the context of fine-mapping admixed populations.</p>
<p>We end by discussing more sophisticated methods for generating our weighted reference haplotypes, the ability to leverage distributions of LD in statistical genetic applications and the effective and efficient data sharing of LD distributions.</p>
|