Analysis of new long-read sequencing data
The rapid development of powerful high throughput sequencing technologies has enabled us to gain valuable insights into the complexities of a human transcriptome. In recent years, Oxford Nanopore has developed a new technology that can take RNA directly as the sequencing input and generates long rea...
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project (FYP) |
Language: | English |
Published: |
2019
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/77170 |
_version_ | 1826111273470787584 |
---|---|
author | Phoa, Yohanes Alfredo |
author2 | Kiah Han Mao |
author_facet | Kiah Han Mao Phoa, Yohanes Alfredo |
author_sort | Phoa, Yohanes Alfredo |
collection | NTU |
description | The rapid development of powerful high throughput sequencing technologies has enabled us to gain valuable insights into the complexities of a human transcriptome. In recent years, Oxford Nanopore has developed a new technology that can take RNA directly as the sequencing input and generates long reads. In this thesis, we are using nanopore reading results from synthetic RNA samples and employ machine learning based approaches to identify patterns that distinguish signals from modified RNA readings from the unmodified counterpart. Firstly, we performed explorations of our dataset using a statistical test. We then proposed a simple baseline algorithm that learns the distinguishing features between unmodified strands and unmodified strands. Finally, we proposed a novel method on detecting anomalies by sequence labeling using deep learning. |
first_indexed | 2024-10-01T02:47:58Z |
format | Final Year Project (FYP) |
id | ntu-10356/77170 |
institution | Nanyang Technological University |
language | English |
last_indexed | 2024-10-01T02:47:58Z |
publishDate | 2019 |
record_format | dspace |
spelling | ntu-10356/771702023-02-28T23:11:32Z Analysis of new long-read sequencing data Phoa, Yohanes Alfredo Kiah Han Mao School of Chemical and Biomedical Engineering Genomic Institute Singapore Tan Meng How DRNTU::Science::Mathematics::Applied mathematics::Simulation and modeling DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition The rapid development of powerful high throughput sequencing technologies has enabled us to gain valuable insights into the complexities of a human transcriptome. In recent years, Oxford Nanopore has developed a new technology that can take RNA directly as the sequencing input and generates long reads. In this thesis, we are using nanopore reading results from synthetic RNA samples and employ machine learning based approaches to identify patterns that distinguish signals from modified RNA readings from the unmodified counterpart. Firstly, we performed explorations of our dataset using a statistical test. We then proposed a simple baseline algorithm that learns the distinguishing features between unmodified strands and unmodified strands. Finally, we proposed a novel method on detecting anomalies by sequence labeling using deep learning. Bachelor of Science in Mathematical Sciences 2019-05-14T13:59:06Z 2019-05-14T13:59:06Z 2019 Final Year Project (FYP) http://hdl.handle.net/10356/77170 en 20 p. application/pdf |
spellingShingle | DRNTU::Science::Mathematics::Applied mathematics::Simulation and modeling DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition Phoa, Yohanes Alfredo Analysis of new long-read sequencing data |
title | Analysis of new long-read sequencing data |
title_full | Analysis of new long-read sequencing data |
title_fullStr | Analysis of new long-read sequencing data |
title_full_unstemmed | Analysis of new long-read sequencing data |
title_short | Analysis of new long-read sequencing data |
title_sort | analysis of new long read sequencing data |
topic | DRNTU::Science::Mathematics::Applied mathematics::Simulation and modeling DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition |
url | http://hdl.handle.net/10356/77170 |
work_keys_str_mv | AT phoayohanesalfredo analysisofnewlongreadsequencingdata |