Identification of genome-wide non-canonical spliced regions and analysis of biological functions for spliced sequences using Read-Split-Fly

Abstract Background It is generally thought that most canonical or non-canonical splicing events involving U2- and U12 spliceosomes occur within nuclear pre-mRNAs. However, the question of whether at least some U12-type splicing occurs in the cytoplasm is still unclear. In recent years next-generati...

Full description

Bibliographic Details
Main Authors: Yongsheng Bai, Jeff Kinne, Lizhong Ding, Ethan C. Rath, Aaron Cox, Siva Dharman Naidu
Format: Article
Language:English
Published: BMC 2017-10-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-017-1801-y
_version_ 1819073412094492672
author Yongsheng Bai
Jeff Kinne
Lizhong Ding
Ethan C. Rath
Aaron Cox
Siva Dharman Naidu
author_facet Yongsheng Bai
Jeff Kinne
Lizhong Ding
Ethan C. Rath
Aaron Cox
Siva Dharman Naidu
author_sort Yongsheng Bai
collection DOAJ
description Abstract Background It is generally thought that most canonical or non-canonical splicing events involving U2- and U12 spliceosomes occur within nuclear pre-mRNAs. However, the question of whether at least some U12-type splicing occurs in the cytoplasm is still unclear. In recent years next-generation sequencing technologies have revolutionized the field. The “Read-Split-Walk” (RSW) and “Read-Split-Run” (RSR) methods were developed to identify genome-wide non-canonical spliced regions including special events occurring in cytoplasm. As the significant amount of genome/transcriptome data such as, Encyclopedia of DNA Elements (ENCODE) project, have been generated, we have advanced a newer more memory-efficient version of the algorithm, “Read-Split-Fly” (RSF), which can detect non-canonical spliced regions with higher sensitivity and improved speed. The RSF algorithm also outputs the spliced sequences for further downstream biological function analysis. Results We used open access ENCODE project RNA-Seq data to search spliced intron sequences against the U12-type spliced intron sequence database to examine whether some events could occur as potential signatures of U12-type splicing. The check was performed by searching spliced sequences against 5’ss and 3’ss sequences from the well-known orthologous U12-type spliceosomal intron database U12DB. Preliminary results of searching 70 ENCODE samples indicated that the presence of 5’ss with U12-type signature is more frequent than U2-type and prevalent in non-canonical junctions reported by RSF. The selected spliced sequences have also been further studied using miRBase to elucidate their functionality. Preliminary results from 70 samples of ENCODE datasets show that several miRNAs are prevalent in studied ENCODE samples. Two of these are associated with many diseases as suggested in the literature. Specifically, hsa-miR-1273 and hsa-miR-548 are associated with many diseases and cancers. Conclusions Our RSF pipeline is able to detect many possible junctions (especially those with a high RPKM) with very high overall accuracy and relative high accuracy for novel junctions. We have incorporated useful parameter features into the pipeline such as, handling variable-length read data, and searching spliced sequences for splicing signatures and miRNA events. We suggest RSF, a tool for identifying novel splicing events, is applicable to study a range of diseases across biological systems under different experimental conditions.
first_indexed 2024-12-21T17:53:12Z
format Article
id doaj.art-9ab440cce9b4471d9780e380c7e91ff3
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-21T17:53:12Z
publishDate 2017-10-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-9ab440cce9b4471d9780e380c7e91ff32022-12-21T18:55:17ZengBMCBMC Bioinformatics1471-21052017-10-0118S11374810.1186/s12859-017-1801-yIdentification of genome-wide non-canonical spliced regions and analysis of biological functions for spliced sequences using Read-Split-FlyYongsheng Bai0Jeff Kinne1Lizhong Ding2Ethan C. Rath3Aaron Cox4Siva Dharman Naidu5Department of Biology, Indiana State UniversityDepartment of Mathematics and Computer Science, Indiana State UniversityDepartment of Biology, Indiana State UniversityDepartment of Biology, Indiana State UniversityDepartment of Mathematics and Computer Science, Indiana State UniversityDepartment of Mathematics and Computer Science, Indiana State UniversityAbstract Background It is generally thought that most canonical or non-canonical splicing events involving U2- and U12 spliceosomes occur within nuclear pre-mRNAs. However, the question of whether at least some U12-type splicing occurs in the cytoplasm is still unclear. In recent years next-generation sequencing technologies have revolutionized the field. The “Read-Split-Walk” (RSW) and “Read-Split-Run” (RSR) methods were developed to identify genome-wide non-canonical spliced regions including special events occurring in cytoplasm. As the significant amount of genome/transcriptome data such as, Encyclopedia of DNA Elements (ENCODE) project, have been generated, we have advanced a newer more memory-efficient version of the algorithm, “Read-Split-Fly” (RSF), which can detect non-canonical spliced regions with higher sensitivity and improved speed. The RSF algorithm also outputs the spliced sequences for further downstream biological function analysis. Results We used open access ENCODE project RNA-Seq data to search spliced intron sequences against the U12-type spliced intron sequence database to examine whether some events could occur as potential signatures of U12-type splicing. The check was performed by searching spliced sequences against 5’ss and 3’ss sequences from the well-known orthologous U12-type spliceosomal intron database U12DB. Preliminary results of searching 70 ENCODE samples indicated that the presence of 5’ss with U12-type signature is more frequent than U2-type and prevalent in non-canonical junctions reported by RSF. The selected spliced sequences have also been further studied using miRBase to elucidate their functionality. Preliminary results from 70 samples of ENCODE datasets show that several miRNAs are prevalent in studied ENCODE samples. Two of these are associated with many diseases as suggested in the literature. Specifically, hsa-miR-1273 and hsa-miR-548 are associated with many diseases and cancers. Conclusions Our RSF pipeline is able to detect many possible junctions (especially those with a high RPKM) with very high overall accuracy and relative high accuracy for novel junctions. We have incorporated useful parameter features into the pipeline such as, handling variable-length read data, and searching spliced sequences for splicing signatures and miRNA events. We suggest RSF, a tool for identifying novel splicing events, is applicable to study a range of diseases across biological systems under different experimental conditions.http://link.springer.com/article/10.1186/s12859-017-1801-yRead-Split-FlyAlternative splicingNon-canonicalRNA-SeqENCODE
spellingShingle Yongsheng Bai
Jeff Kinne
Lizhong Ding
Ethan C. Rath
Aaron Cox
Siva Dharman Naidu
Identification of genome-wide non-canonical spliced regions and analysis of biological functions for spliced sequences using Read-Split-Fly
BMC Bioinformatics
Read-Split-Fly
Alternative splicing
Non-canonical
RNA-Seq
ENCODE
title Identification of genome-wide non-canonical spliced regions and analysis of biological functions for spliced sequences using Read-Split-Fly
title_full Identification of genome-wide non-canonical spliced regions and analysis of biological functions for spliced sequences using Read-Split-Fly
title_fullStr Identification of genome-wide non-canonical spliced regions and analysis of biological functions for spliced sequences using Read-Split-Fly
title_full_unstemmed Identification of genome-wide non-canonical spliced regions and analysis of biological functions for spliced sequences using Read-Split-Fly
title_short Identification of genome-wide non-canonical spliced regions and analysis of biological functions for spliced sequences using Read-Split-Fly
title_sort identification of genome wide non canonical spliced regions and analysis of biological functions for spliced sequences using read split fly
topic Read-Split-Fly
Alternative splicing
Non-canonical
RNA-Seq
ENCODE
url http://link.springer.com/article/10.1186/s12859-017-1801-y
work_keys_str_mv AT yongshengbai identificationofgenomewidenoncanonicalsplicedregionsandanalysisofbiologicalfunctionsforsplicedsequencesusingreadsplitfly
AT jeffkinne identificationofgenomewidenoncanonicalsplicedregionsandanalysisofbiologicalfunctionsforsplicedsequencesusingreadsplitfly
AT lizhongding identificationofgenomewidenoncanonicalsplicedregionsandanalysisofbiologicalfunctionsforsplicedsequencesusingreadsplitfly
AT ethancrath identificationofgenomewidenoncanonicalsplicedregionsandanalysisofbiologicalfunctionsforsplicedsequencesusingreadsplitfly
AT aaroncox identificationofgenomewidenoncanonicalsplicedregionsandanalysisofbiologicalfunctionsforsplicedsequencesusingreadsplitfly
AT sivadharmannaidu identificationofgenomewidenoncanonicalsplicedregionsandanalysisofbiologicalfunctionsforsplicedsequencesusingreadsplitfly