Multi-resolution modeling of a discrete stochastic process identifies causes of cancer

Detection of cancer-causing mutations within the vast and mostly unexplored human genome is a major challenge. Doing so requires modeling the background mutation rate, a highly non-stationary stochastic process, across regions of interest varying in size from one to millions of positions. Here, we p...

Full description

Bibliographic Details
Main Author: Yaari, Adam Uri
Other Authors: Berger, Bonnie
Format: Thesis
Published: Massachusetts Institute of Technology 2022
Online Access:https://hdl.handle.net/1721.1/139334
_version_ 1811069495042637824
author Yaari, Adam Uri
author2 Berger, Bonnie
author_facet Berger, Bonnie
Yaari, Adam Uri
author_sort Yaari, Adam Uri
collection MIT
description Detection of cancer-causing mutations within the vast and mostly unexplored human genome is a major challenge. Doing so requires modeling the background mutation rate, a highly non-stationary stochastic process, across regions of interest varying in size from one to millions of positions. Here, we present the split-Poisson-Gamma (SPG) distribution, an extension of the classical Poisson-Gamma formulation, to model a discrete stochastic process at multiple resolutions. We demonstrate that the probability model has a closed-form posterior, enabling efficient and accurate linear-time prediction over any length scale after the parameters of the model have been inferred a single time. We apply our framework to model mutation rates in tumors and show that model parameters can be accurately inferred from high-dimensional epigenetic data using a convolutional neural network, Gaussian process, and maximum-likelihood estimation. Our method is both more accurate and more efficient than existing models over a large range of length scales. We demonstrate the usefulness of multi-resolution modeling by detecting genomic elements that drive tumor emergence and are of vastly differing sizes.
first_indexed 2024-09-23T08:11:21Z
format Thesis
id mit-1721.1/139334
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T08:11:21Z
publishDate 2022
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1393342022-01-15T04:08:47Z Multi-resolution modeling of a discrete stochastic process identifies causes of cancer Yaari, Adam Uri Berger, Bonnie Katz, Boris Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Detection of cancer-causing mutations within the vast and mostly unexplored human genome is a major challenge. Doing so requires modeling the background mutation rate, a highly non-stationary stochastic process, across regions of interest varying in size from one to millions of positions. Here, we present the split-Poisson-Gamma (SPG) distribution, an extension of the classical Poisson-Gamma formulation, to model a discrete stochastic process at multiple resolutions. We demonstrate that the probability model has a closed-form posterior, enabling efficient and accurate linear-time prediction over any length scale after the parameters of the model have been inferred a single time. We apply our framework to model mutation rates in tumors and show that model parameters can be accurately inferred from high-dimensional epigenetic data using a convolutional neural network, Gaussian process, and maximum-likelihood estimation. Our method is both more accurate and more efficient than existing models over a large range of length scales. We demonstrate the usefulness of multi-resolution modeling by detecting genomic elements that drive tumor emergence and are of vastly differing sizes. S.M. 2022-01-14T15:04:43Z 2022-01-14T15:04:43Z 2021-06 2021-06-24T19:42:23.070Z Thesis https://hdl.handle.net/1721.1/139334 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Yaari, Adam Uri
Multi-resolution modeling of a discrete stochastic process identifies causes of cancer
title Multi-resolution modeling of a discrete stochastic process identifies causes of cancer
title_full Multi-resolution modeling of a discrete stochastic process identifies causes of cancer
title_fullStr Multi-resolution modeling of a discrete stochastic process identifies causes of cancer
title_full_unstemmed Multi-resolution modeling of a discrete stochastic process identifies causes of cancer
title_short Multi-resolution modeling of a discrete stochastic process identifies causes of cancer
title_sort multi resolution modeling of a discrete stochastic process identifies causes of cancer
url https://hdl.handle.net/1721.1/139334
work_keys_str_mv AT yaariadamuri multiresolutionmodelingofadiscretestochasticprocessidentifiescausesofcancer