Summary: | A long-standing goal of biology is to understand how the 3 billion bases of DNA in each human cell contribute to molecular, cellular, and, ultimately, organism function. Somatic mutations, which arise in cells during the course of life, are natural experiments that can be leveraged to provide insight into this profound question. This thesis develops computational methods to identify somatic mutations and infer their phenotypic relationships from population-scale genome sequencing. The methods are developed and applied in the context of two human diseases, autism spectrum disorder and cancer. First, we develop a suite of computational tools to detect somatic copy number variants that likely arose during early embryonic development. We apply this tool set to establish that such CNVs contribute substantially to the risk of developing autism spectrum disorder in a small number of carriers. We next develop a general purpose method for modeling discrete stochastic processes at multiple resolutions. We demonstrate the utility of the method by modeling patterns of somatic mutations across the cancer genome. We finally extend and apply the aforementioned method to map somatic mutation rates in 37 types of cancer and identify sets of mutations that likely drive cancer growth in both coding and noncoding regions of the genome. Broadly, this work demonstrates how the unique challenges of biological data can both inform and benefit from computational research.
|