Analysis of new long-read sequencing data

The rapid development of powerful high throughput sequencing technologies has enabled us to gain valuable insights into the complexities of a human transcriptome. In recent years, Oxford Nanopore has developed a new technology that can take RNA directly as the sequencing input and generates long rea...

Full description

Bibliographic Details
Main Author: Phoa, Yohanes Alfredo
Other Authors: Kiah Han Mao
Format: Final Year Project (FYP)
Language:English
Published: 2019
Subjects:
Online Access:http://hdl.handle.net/10356/77170
_version_ 1826111273470787584
author Phoa, Yohanes Alfredo
author2 Kiah Han Mao
author_facet Kiah Han Mao
Phoa, Yohanes Alfredo
author_sort Phoa, Yohanes Alfredo
collection NTU
description The rapid development of powerful high throughput sequencing technologies has enabled us to gain valuable insights into the complexities of a human transcriptome. In recent years, Oxford Nanopore has developed a new technology that can take RNA directly as the sequencing input and generates long reads. In this thesis, we are using nanopore reading results from synthetic RNA samples and employ machine learning based approaches to identify patterns that distinguish signals from modified RNA readings from the unmodified counterpart. Firstly, we performed explorations of our dataset using a statistical test. We then proposed a simple baseline algorithm that learns the distinguishing features between unmodified strands and unmodified strands. Finally, we proposed a novel method on detecting anomalies by sequence labeling using deep learning.
first_indexed 2024-10-01T02:47:58Z
format Final Year Project (FYP)
id ntu-10356/77170
institution Nanyang Technological University
language English
last_indexed 2024-10-01T02:47:58Z
publishDate 2019
record_format dspace
spelling ntu-10356/771702023-02-28T23:11:32Z Analysis of new long-read sequencing data Phoa, Yohanes Alfredo Kiah Han Mao School of Chemical and Biomedical Engineering Genomic Institute Singapore Tan Meng How DRNTU::Science::Mathematics::Applied mathematics::Simulation and modeling DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition The rapid development of powerful high throughput sequencing technologies has enabled us to gain valuable insights into the complexities of a human transcriptome. In recent years, Oxford Nanopore has developed a new technology that can take RNA directly as the sequencing input and generates long reads. In this thesis, we are using nanopore reading results from synthetic RNA samples and employ machine learning based approaches to identify patterns that distinguish signals from modified RNA readings from the unmodified counterpart. Firstly, we performed explorations of our dataset using a statistical test. We then proposed a simple baseline algorithm that learns the distinguishing features between unmodified strands and unmodified strands. Finally, we proposed a novel method on detecting anomalies by sequence labeling using deep learning. Bachelor of Science in Mathematical Sciences 2019-05-14T13:59:06Z 2019-05-14T13:59:06Z 2019 Final Year Project (FYP) http://hdl.handle.net/10356/77170 en 20 p. application/pdf
spellingShingle DRNTU::Science::Mathematics::Applied mathematics::Simulation and modeling
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
Phoa, Yohanes Alfredo
Analysis of new long-read sequencing data
title Analysis of new long-read sequencing data
title_full Analysis of new long-read sequencing data
title_fullStr Analysis of new long-read sequencing data
title_full_unstemmed Analysis of new long-read sequencing data
title_short Analysis of new long-read sequencing data
title_sort analysis of new long read sequencing data
topic DRNTU::Science::Mathematics::Applied mathematics::Simulation and modeling
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
url http://hdl.handle.net/10356/77170
work_keys_str_mv AT phoayohanesalfredo analysisofnewlongreadsequencingdata