Using machine learning to discover candidate localised transcripts from microscopy and genome-wide bioinformatics data

<p>Messenger RNA (mRNA) localisation is an important layer of post-transcriptional regulation. In polarised cells such as glia, mRNA localisation is especially significant as glial cells are long with protrusions that can be far away from the cell body, where mRNA is produced. Thus, mRNA trans...

Full description

Bibliographic Details
Main Author: Kiourlappou, M
Other Authors: Davis, I
Format: Thesis
Language:English
Published: 2023
Subjects:
_version_ 1811139478004170752
author Kiourlappou, M
author2 Davis, I
author_facet Davis, I
Kiourlappou, M
author_sort Kiourlappou, M
collection OXFORD
description <p>Messenger RNA (mRNA) localisation is an important layer of post-transcriptional regulation. In polarised cells such as glia, mRNA localisation is especially significant as glial cells are long with protrusions that can be far away from the cell body, where mRNA is produced. Thus, mRNA transport and localised translation are essential to ensure that mRNAs and proteins reach their intended destination in a timely and coordinated manner. Proper mRNA localisation is vital for the correct functioning of the entirety of a glial cell and dysregulation of mRNA localisation in glial cells has been implicated in neurodegenerative and neurological diseases and some types of cancer. Therefore, gaining a deeper understanding of the mechanisms that mediate mRNA localisation is of paramount value.</p> <p>Experimental techniques such as single molecule fluorescence in-situ hybridization (smFISH) or RNA sequencing (RNA-seq) are often utilised to study mRNA localisation, the former to view a visual map of the mRNAs’ spatial distribution and expression and the latter to study with more quantitative detail their expression in single tissues or bulk tissue. In this thesis, I study mRNA localisation using computational pipelines to systematically interrogate datasets available to study mRNA localisation in glia and additionally I am using supervised machine learning models to predict candidate genes that could potentially be localised in glial cell protrusions.</p> <p>Firstly, I present a YFP protein trap microscopy screen conducted collaboratively by members of the Davis lab for 200 genes in the <em>Drosophila melanogaster</em> larval nervous system. In this screen we explore the transcriptomic and proteomic spatial distribution for the 200 genes using smFISH, for seven compartments in the larval nervous system. Next, I present the innovative pipeline I use to systematically manage the output raw microscopy images and to create figures (approximately 1400 figures). Next in the pipeline is the use of a scoring application I co-produced with Dr David Pinto in-house for expert scorers to annotate the figures created. Three expert scorers use the application to annotate the figures with labels describing mRNA and protein expression patterns and levels. After the production of labels, I present the figures using a novel visualisation tool which allows a user to explore the images alongside with the labels describing localisation for each gene and the different compartments examined for each gene. Additionally, I incorporated genome-wide bioinformatic datasets in such a way so that a user can use the visualisation tool to discover associations between the figures, the localisation labels and the bioinformatics datasets for different genes or groups of genes.</p> <p>Next, I focus on one tissue of interest: glial cells. I create localisation labels for <em>Drosophila melanogaster</em> and <em>Mus musculus</em> using the smFISH microscopy screen and RNA-seq analysed datasets identified from literature, by Dr Jeff Lee and Dalia Gala. I then create a conglomerate of heterogeneous datasets, all related to mRNA localisation. I then transform the datasets into inter- operable and compatible, gene-centric datasets ready to be used by machine-learning algorithms.</p> <p>Finally, I explore different types of machine learning algorithms to assess their suitability for my datasets. I choose Random Forest (RF) as the most compatible algorithm and adapt a pipeline based on RF for sleep candidate gene selection. Using the adapted pipeline I predict 66 genes highly likely to be localised in glial cells protrusions and assess their functional and cellular attributes using GO enrichment. I find that the group of predicted genes is highly enriched in terms involving localisation, transport and cell protrusions. Additionally, I discover that the most influential factor in the training of the RF model is the RNA binding proteins (RBPs) target site motifs. I further investigate the specific RBPs that contribute the most to the model and find a large percentage of them have established roles in mRNA localisation and RNA methylation.</p> <p>The work completed in this thesis, is an exciting starting point for using novel visualisation tools to explore data and machine learning techniques to study mRNA localisation. The results I described in the previous paragraph are the first indication of the success of using a machine learning approach.</p>
first_indexed 2024-09-25T04:06:43Z
format Thesis
id oxford-uuid:38adaa84-696c-4393-ad36-82048b2f4957
institution University of Oxford
language English
last_indexed 2024-09-25T04:06:43Z
publishDate 2023
record_format dspace
spelling oxford-uuid:38adaa84-696c-4393-ad36-82048b2f49572024-06-04T09:54:02ZUsing machine learning to discover candidate localised transcripts from microscopy and genome-wide bioinformatics dataThesishttp://purl.org/coar/resource_type/c_db06uuid:38adaa84-696c-4393-ad36-82048b2f4957BiochemistrySupervised learning (Machine learning)EnglishHyrax Deposit2023Kiourlappou, MDavis, ITaylor, SHamilton, RBaker, R<p>Messenger RNA (mRNA) localisation is an important layer of post-transcriptional regulation. In polarised cells such as glia, mRNA localisation is especially significant as glial cells are long with protrusions that can be far away from the cell body, where mRNA is produced. Thus, mRNA transport and localised translation are essential to ensure that mRNAs and proteins reach their intended destination in a timely and coordinated manner. Proper mRNA localisation is vital for the correct functioning of the entirety of a glial cell and dysregulation of mRNA localisation in glial cells has been implicated in neurodegenerative and neurological diseases and some types of cancer. Therefore, gaining a deeper understanding of the mechanisms that mediate mRNA localisation is of paramount value.</p> <p>Experimental techniques such as single molecule fluorescence in-situ hybridization (smFISH) or RNA sequencing (RNA-seq) are often utilised to study mRNA localisation, the former to view a visual map of the mRNAs’ spatial distribution and expression and the latter to study with more quantitative detail their expression in single tissues or bulk tissue. In this thesis, I study mRNA localisation using computational pipelines to systematically interrogate datasets available to study mRNA localisation in glia and additionally I am using supervised machine learning models to predict candidate genes that could potentially be localised in glial cell protrusions.</p> <p>Firstly, I present a YFP protein trap microscopy screen conducted collaboratively by members of the Davis lab for 200 genes in the <em>Drosophila melanogaster</em> larval nervous system. In this screen we explore the transcriptomic and proteomic spatial distribution for the 200 genes using smFISH, for seven compartments in the larval nervous system. Next, I present the innovative pipeline I use to systematically manage the output raw microscopy images and to create figures (approximately 1400 figures). Next in the pipeline is the use of a scoring application I co-produced with Dr David Pinto in-house for expert scorers to annotate the figures created. Three expert scorers use the application to annotate the figures with labels describing mRNA and protein expression patterns and levels. After the production of labels, I present the figures using a novel visualisation tool which allows a user to explore the images alongside with the labels describing localisation for each gene and the different compartments examined for each gene. Additionally, I incorporated genome-wide bioinformatic datasets in such a way so that a user can use the visualisation tool to discover associations between the figures, the localisation labels and the bioinformatics datasets for different genes or groups of genes.</p> <p>Next, I focus on one tissue of interest: glial cells. I create localisation labels for <em>Drosophila melanogaster</em> and <em>Mus musculus</em> using the smFISH microscopy screen and RNA-seq analysed datasets identified from literature, by Dr Jeff Lee and Dalia Gala. I then create a conglomerate of heterogeneous datasets, all related to mRNA localisation. I then transform the datasets into inter- operable and compatible, gene-centric datasets ready to be used by machine-learning algorithms.</p> <p>Finally, I explore different types of machine learning algorithms to assess their suitability for my datasets. I choose Random Forest (RF) as the most compatible algorithm and adapt a pipeline based on RF for sleep candidate gene selection. Using the adapted pipeline I predict 66 genes highly likely to be localised in glial cells protrusions and assess their functional and cellular attributes using GO enrichment. I find that the group of predicted genes is highly enriched in terms involving localisation, transport and cell protrusions. Additionally, I discover that the most influential factor in the training of the RF model is the RNA binding proteins (RBPs) target site motifs. I further investigate the specific RBPs that contribute the most to the model and find a large percentage of them have established roles in mRNA localisation and RNA methylation.</p> <p>The work completed in this thesis, is an exciting starting point for using novel visualisation tools to explore data and machine learning techniques to study mRNA localisation. The results I described in the previous paragraph are the first indication of the success of using a machine learning approach.</p>
spellingShingle Biochemistry
Supervised learning (Machine learning)
Kiourlappou, M
Using machine learning to discover candidate localised transcripts from microscopy and genome-wide bioinformatics data
title Using machine learning to discover candidate localised transcripts from microscopy and genome-wide bioinformatics data
title_full Using machine learning to discover candidate localised transcripts from microscopy and genome-wide bioinformatics data
title_fullStr Using machine learning to discover candidate localised transcripts from microscopy and genome-wide bioinformatics data
title_full_unstemmed Using machine learning to discover candidate localised transcripts from microscopy and genome-wide bioinformatics data
title_short Using machine learning to discover candidate localised transcripts from microscopy and genome-wide bioinformatics data
title_sort using machine learning to discover candidate localised transcripts from microscopy and genome wide bioinformatics data
topic Biochemistry
Supervised learning (Machine learning)
work_keys_str_mv AT kiourlappoum usingmachinelearningtodiscovercandidatelocalisedtranscriptsfrommicroscopyandgenomewidebioinformaticsdata