Multi-resolution modeling of a discrete stochastic process identifies causes of cancer
Detection of cancer-causing mutations within the vast and mostly unexplored human genome is a major challenge. Doing so requires modeling the background mutation rate, a highly non-stationary stochastic process, across regions of interest varying in size from one to millions of positions. Here, we p...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2022
|
Online Access: | https://hdl.handle.net/1721.1/139334 |
_version_ | 1811069495042637824 |
---|---|
author | Yaari, Adam Uri |
author2 | Berger, Bonnie |
author_facet | Berger, Bonnie Yaari, Adam Uri |
author_sort | Yaari, Adam Uri |
collection | MIT |
description | Detection of cancer-causing mutations within the vast and mostly unexplored human genome is a major challenge. Doing so requires modeling the background mutation rate, a highly non-stationary stochastic process, across regions of interest varying in size from one to millions of positions. Here, we present the split-Poisson-Gamma (SPG) distribution, an extension of the classical Poisson-Gamma formulation, to model a discrete stochastic process at multiple resolutions. We demonstrate that the probability model has a closed-form posterior, enabling efficient and accurate linear-time prediction over any length scale after the parameters of the model have been inferred a single time. We apply our framework to model mutation rates in tumors and show that model parameters can be accurately inferred from high-dimensional epigenetic data using a convolutional neural network, Gaussian process, and maximum-likelihood estimation. Our method is both more accurate and more efficient than existing models over a large range of length scales. We demonstrate the usefulness of multi-resolution modeling by detecting genomic elements that drive tumor emergence and are of vastly differing sizes. |
first_indexed | 2024-09-23T08:11:21Z |
format | Thesis |
id | mit-1721.1/139334 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T08:11:21Z |
publishDate | 2022 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1393342022-01-15T04:08:47Z Multi-resolution modeling of a discrete stochastic process identifies causes of cancer Yaari, Adam Uri Berger, Bonnie Katz, Boris Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Detection of cancer-causing mutations within the vast and mostly unexplored human genome is a major challenge. Doing so requires modeling the background mutation rate, a highly non-stationary stochastic process, across regions of interest varying in size from one to millions of positions. Here, we present the split-Poisson-Gamma (SPG) distribution, an extension of the classical Poisson-Gamma formulation, to model a discrete stochastic process at multiple resolutions. We demonstrate that the probability model has a closed-form posterior, enabling efficient and accurate linear-time prediction over any length scale after the parameters of the model have been inferred a single time. We apply our framework to model mutation rates in tumors and show that model parameters can be accurately inferred from high-dimensional epigenetic data using a convolutional neural network, Gaussian process, and maximum-likelihood estimation. Our method is both more accurate and more efficient than existing models over a large range of length scales. We demonstrate the usefulness of multi-resolution modeling by detecting genomic elements that drive tumor emergence and are of vastly differing sizes. S.M. 2022-01-14T15:04:43Z 2022-01-14T15:04:43Z 2021-06 2021-06-24T19:42:23.070Z Thesis https://hdl.handle.net/1721.1/139334 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Yaari, Adam Uri Multi-resolution modeling of a discrete stochastic process identifies causes of cancer |
title | Multi-resolution modeling of a discrete stochastic process identifies causes of cancer |
title_full | Multi-resolution modeling of a discrete stochastic process identifies causes of cancer |
title_fullStr | Multi-resolution modeling of a discrete stochastic process identifies causes of cancer |
title_full_unstemmed | Multi-resolution modeling of a discrete stochastic process identifies causes of cancer |
title_short | Multi-resolution modeling of a discrete stochastic process identifies causes of cancer |
title_sort | multi resolution modeling of a discrete stochastic process identifies causes of cancer |
url | https://hdl.handle.net/1721.1/139334 |
work_keys_str_mv | AT yaariadamuri multiresolutionmodelingofadiscretestochasticprocessidentifiescausesofcancer |