DNA sequence driven machine learning for modelling replication timing
All human somatic cells copy their entire genome during mitotic replication, in the S-phase of the cell cycle. Replication timing (RT) is the temporal order of genome replication in S-phase and has been shown to have consistent global “profiles” across a wide range of tissues and diseases. We demon...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Language: | English |
Published: |
2023
|
Subjects: |
_version_ | 1797112419630186496 |
---|---|
author | Ashford, J |
author2 | Sahakyan, A |
author_facet | Sahakyan, A Ashford, J |
author_sort | Ashford, J |
collection | OXFORD |
description |
All human somatic cells copy their entire genome during mitotic replication, in the S-phase of the cell cycle. Replication timing (RT) is the temporal order of genome replication in S-phase and has been shown to have consistent global “profiles” across a wide range of tissues and diseases. We demonstrate that while there are many factors that influence the specific RT characteristics of individual cell types, there is a strong link between the DNA sequence composition and the overall RT behaviour. This is achieved by accurately modelling the aggregate profiles from 131 RT experiments constituting 56 unique human cell types, using only engineered features of the DNA sequences as input. We then derive insight into how the composition of DNA sequences impacts RT values, by observing the impact of in silico sequence modifications on model predictions. We further extend our modelling towards cell-type specific predictions with a single model by incorporating a minimal source of extra information, ATAC-seq, which provides context for chromatin organisation. The obtained machine learning models, along with the underlying exploratory data analyses and feature engineering, are both useful for prediction of RT and shed light on the underlying DNA sequence basis of the replication phenomenon. |
first_indexed | 2024-03-07T08:23:58Z |
format | Thesis |
id | oxford-uuid:f6607437-3e49-4fa1-bef2-123a76cce097 |
institution | University of Oxford |
language | English |
last_indexed | 2024-03-07T08:23:58Z |
publishDate | 2023 |
record_format | dspace |
spelling | oxford-uuid:f6607437-3e49-4fa1-bef2-123a76cce0972024-02-07T16:59:55ZDNA sequence driven machine learning for modelling replication timingThesishttp://purl.org/coar/resource_type/c_db06uuid:f6607437-3e49-4fa1-bef2-123a76cce097Machine learningGeneticsMedical sciencesArtificial intelligenceEnglishHyrax Deposit2023Ashford, JSahakyan, ADeBruijn, M All human somatic cells copy their entire genome during mitotic replication, in the S-phase of the cell cycle. Replication timing (RT) is the temporal order of genome replication in S-phase and has been shown to have consistent global “profiles” across a wide range of tissues and diseases. We demonstrate that while there are many factors that influence the specific RT characteristics of individual cell types, there is a strong link between the DNA sequence composition and the overall RT behaviour. This is achieved by accurately modelling the aggregate profiles from 131 RT experiments constituting 56 unique human cell types, using only engineered features of the DNA sequences as input. We then derive insight into how the composition of DNA sequences impacts RT values, by observing the impact of in silico sequence modifications on model predictions. We further extend our modelling towards cell-type specific predictions with a single model by incorporating a minimal source of extra information, ATAC-seq, which provides context for chromatin organisation. The obtained machine learning models, along with the underlying exploratory data analyses and feature engineering, are both useful for prediction of RT and shed light on the underlying DNA sequence basis of the replication phenomenon. |
spellingShingle | Machine learning Genetics Medical sciences Artificial intelligence Ashford, J DNA sequence driven machine learning for modelling replication timing |
title | DNA sequence driven machine learning for modelling replication timing |
title_full | DNA sequence driven machine learning for modelling replication timing |
title_fullStr | DNA sequence driven machine learning for modelling replication timing |
title_full_unstemmed | DNA sequence driven machine learning for modelling replication timing |
title_short | DNA sequence driven machine learning for modelling replication timing |
title_sort | dna sequence driven machine learning for modelling replication timing |
topic | Machine learning Genetics Medical sciences Artificial intelligence |
work_keys_str_mv | AT ashfordj dnasequencedrivenmachinelearningformodellingreplicationtiming |