DNA sequence driven machine learning for modelling replication timing

All human somatic cells copy their entire genome during mitotic replication, in the S-phase of the cell cycle. Replication timing (RT) is the temporal order of genome replication in S-phase and has been shown to have consistent global “profiles” across a wide range of tissues and diseases. We demon...

Full description

Bibliographic Details
Main Author: Ashford, J
Other Authors: Sahakyan, A
Format: Thesis
Language:English
Published: 2023
Subjects:
Description
Summary:All human somatic cells copy their entire genome during mitotic replication, in the S-phase of the cell cycle. Replication timing (RT) is the temporal order of genome replication in S-phase and has been shown to have consistent global “profiles” across a wide range of tissues and diseases. We demonstrate that while there are many factors that influence the specific RT characteristics of individual cell types, there is a strong link between the DNA sequence composition and the overall RT behaviour. This is achieved by accurately modelling the aggregate profiles from 131 RT experiments constituting 56 unique human cell types, using only engineered features of the DNA sequences as input. We then derive insight into how the composition of DNA sequences impacts RT values, by observing the impact of in silico sequence modifications on model predictions. We further extend our modelling towards cell-type specific predictions with a single model by incorporating a minimal source of extra information, ATAC-seq, which provides context for chromatin organisation. The obtained machine learning models, along with the underlying exploratory data analyses and feature engineering, are both useful for prediction of RT and shed light on the underlying DNA sequence basis of the replication phenomenon.