Ancestral trees as weighted networks: scalable screening for genome wide association studies

<p>Several haplotype-based methods have been developed to identify loci where multiple mutations of low to moderate frequency and effect size modulate disease susceptibility. Most such approaches either do not scale to hundreds of thousands of genomes or do not explicitly model recombination a...

Full description

Bibliographic Details
Main Author: Christ, R
Other Authors: Aslett, L
Format: Thesis
Published: 2017
Description
Summary:<p>Several haplotype-based methods have been developed to identify loci where multiple mutations of low to moderate frequency and effect size modulate disease susceptibility. Most such approaches either do not scale to hundreds of thousands of genomes or do not explicitly model recombination and the block-like structure of haplotypes. Using a novel checkpointing technique and a C-core, vectorized implementation of an Hidden Markov Model (HMM) based on the Li &amp; Stephens Model, at each single nucleotide polymorphism (SNP) along the genome, we obtain a local genetic distance between all pairs of haplotypes in a phased dataset. To rapidly test this local distance matrix for association with a phenotype, we derive two finite sample central limit type theorems for quadratic forms which do not require any further assumptions on the matrix other than it is free of outliers, for which we have an easily calculable, formal condition. We combine these results with a novel concentration inequality for Gaussian quadratic forms to upper and lower bound p-values for quadratic forms while avoiding a full eigendecomposition of each matrix. Applying our HMM implementation and quadratic form screening method, we recover known loci associated with malaria susceptibility and uncover new potential associations in a pilot dataset of 6,136 haplotypes.</p>