Data-driven Mechanistic Modeling of 3D Human Genome

Three-dimensional (3D) organization of the human genome regulates DNA-templated processes, including gene transcription, gene regulation, and DNA replication, which are crucial for cell differentiation and cell functionality. Computational modeling serves as an efficient and effective way of buildin...

Full description

Bibliographic Details
Main Author: Qi, Yifeng
Other Authors: Zhang, Bin
Format: Thesis
Published: Massachusetts Institute of Technology 2022
Online Access:https://hdl.handle.net/1721.1/143360
Description
Summary:Three-dimensional (3D) organization of the human genome regulates DNA-templated processes, including gene transcription, gene regulation, and DNA replication, which are crucial for cell differentiation and cell functionality. Computational modeling serves as an efficient and effective way of building high-resolution 3D genome structures and improving our understanding of these molecular processes. My PhD research has been focused on the development of a data-driven, mechanistic modeling framework aiming to better understand the physical principles of how genome organizes as well as the mechanisms of genome structure-coupled biological processes, such as the coalescence of nuclear bodies. This thesis is organized as follows. In the first chapter, we introduce a computational model to simulate chromatin structure and dynamics. The model defines chromatin states by taking one-dimensional genomics and epigenomics data as input and quantitatively learns interacting patterns between these states using experimental contact data. Once learned, the model is able to make de novo predictions of 3D chromatin structures at five-kilo-base resolution across different cell types. The manuscript associated with this study is published in PLoS Computational Biology, 15.6, e1007024 (2019). In the second chapter, we expand the spatial scale of the model to study the organization of the global diploid human genome in the entire nucleus. It has both data-driven and mechanistic nature, as the energy function is explicitly written out based on biologically motivated hypotheses, and all parameters are quantitatively derived from experimental contact data. The model has shown its usefulness both in reconstructing whole-genome structures and in exploring the physical and biological principles of genome organization. The manuscript associated with this study is published in Biophysical Journal, 119, 1905 (2020). In the third chapter, we further apply the data-driven modeling framework that we have developed to study the thermodynamics and kinetics of the formation and coalescence of nuclear bodies. Our study suggests that protein-chromatin interactions facilitate the nucleation of droplets, but hinder their coarsening due to the correlated motion between droplets and the chromatin network: as droplets coalesce, the chromatin network becomes increasingly constrained and is entropically unfavorable. Therefore, protein-chromatin interactions arrest phase separation in multi-droplet states and may drive the variation of nuclear body numbers across cell types. The manuscript associated with this study is published in Nature Communications, 12, 1 (2021).