DNA sequence driven machine learning for modelling replication timing

All human somatic cells copy their entire genome during mitotic replication, in the S-phase of the cell cycle. Replication timing (RT) is the temporal order of genome replication in S-phase and has been shown to have consistent global “profiles” across a wide range of tissues and diseases. We demon...

Full description

Bibliographic Details
Main Author: Ashford, J
Other Authors: Sahakyan, A
Format: Thesis
Language:English
Published: 2023
Subjects:
_version_ 1797112419630186496
author Ashford, J
author2 Sahakyan, A
author_facet Sahakyan, A
Ashford, J
author_sort Ashford, J
collection OXFORD
description All human somatic cells copy their entire genome during mitotic replication, in the S-phase of the cell cycle. Replication timing (RT) is the temporal order of genome replication in S-phase and has been shown to have consistent global “profiles” across a wide range of tissues and diseases. We demonstrate that while there are many factors that influence the specific RT characteristics of individual cell types, there is a strong link between the DNA sequence composition and the overall RT behaviour. This is achieved by accurately modelling the aggregate profiles from 131 RT experiments constituting 56 unique human cell types, using only engineered features of the DNA sequences as input. We then derive insight into how the composition of DNA sequences impacts RT values, by observing the impact of in silico sequence modifications on model predictions. We further extend our modelling towards cell-type specific predictions with a single model by incorporating a minimal source of extra information, ATAC-seq, which provides context for chromatin organisation. The obtained machine learning models, along with the underlying exploratory data analyses and feature engineering, are both useful for prediction of RT and shed light on the underlying DNA sequence basis of the replication phenomenon.
first_indexed 2024-03-07T08:23:58Z
format Thesis
id oxford-uuid:f6607437-3e49-4fa1-bef2-123a76cce097
institution University of Oxford
language English
last_indexed 2024-03-07T08:23:58Z
publishDate 2023
record_format dspace
spelling oxford-uuid:f6607437-3e49-4fa1-bef2-123a76cce0972024-02-07T16:59:55ZDNA sequence driven machine learning for modelling replication timingThesishttp://purl.org/coar/resource_type/c_db06uuid:f6607437-3e49-4fa1-bef2-123a76cce097Machine learningGeneticsMedical sciencesArtificial intelligenceEnglishHyrax Deposit2023Ashford, JSahakyan, ADeBruijn, M All human somatic cells copy their entire genome during mitotic replication, in the S-phase of the cell cycle. Replication timing (RT) is the temporal order of genome replication in S-phase and has been shown to have consistent global “profiles” across a wide range of tissues and diseases. We demonstrate that while there are many factors that influence the specific RT characteristics of individual cell types, there is a strong link between the DNA sequence composition and the overall RT behaviour. This is achieved by accurately modelling the aggregate profiles from 131 RT experiments constituting 56 unique human cell types, using only engineered features of the DNA sequences as input. We then derive insight into how the composition of DNA sequences impacts RT values, by observing the impact of in silico sequence modifications on model predictions. We further extend our modelling towards cell-type specific predictions with a single model by incorporating a minimal source of extra information, ATAC-seq, which provides context for chromatin organisation. The obtained machine learning models, along with the underlying exploratory data analyses and feature engineering, are both useful for prediction of RT and shed light on the underlying DNA sequence basis of the replication phenomenon.
spellingShingle Machine learning
Genetics
Medical sciences
Artificial intelligence
Ashford, J
DNA sequence driven machine learning for modelling replication timing
title DNA sequence driven machine learning for modelling replication timing
title_full DNA sequence driven machine learning for modelling replication timing
title_fullStr DNA sequence driven machine learning for modelling replication timing
title_full_unstemmed DNA sequence driven machine learning for modelling replication timing
title_short DNA sequence driven machine learning for modelling replication timing
title_sort dna sequence driven machine learning for modelling replication timing
topic Machine learning
Genetics
Medical sciences
Artificial intelligence
work_keys_str_mv AT ashfordj dnasequencedrivenmachinelearningformodellingreplicationtiming