Identification of a dinucleotide signature that discriminates coding from non-coding long RNAs

To date, the main criterion by which long ncRNAs (lncRNAs) are discriminated from mRNAs is based on the capacity of the transcripts to encode a protein. However, it becomes important to identify non-ORF-based sequence characteristics that can be used to parse between ncRNAs and mRNAs. In this study,...

Full description

Bibliographic Details
Main Authors: Damien eUlveling, Marcel E Dinger, Claire eFrancastel, Florent eHubé
Format: Article
Language:English
Published: Frontiers Media S.A. 2014-09-01
Series:Frontiers in Genetics
Subjects:
Online Access:http://journal.frontiersin.org/Journal/10.3389/fgene.2014.00316/full
_version_ 1811211521015939072
author Damien eUlveling
Marcel E Dinger
Claire eFrancastel
Florent eHubé
author_facet Damien eUlveling
Marcel E Dinger
Claire eFrancastel
Florent eHubé
author_sort Damien eUlveling
collection DOAJ
description To date, the main criterion by which long ncRNAs (lncRNAs) are discriminated from mRNAs is based on the capacity of the transcripts to encode a protein. However, it becomes important to identify non-ORF-based sequence characteristics that can be used to parse between ncRNAs and mRNAs. In this study, we first established an extremely selective workflow to define a highly refined database of lncRNAs which was used for comparison with mRNAs. Then using this highly selective collection of lncRNAs, we found the CG dinucleotide frequencies were clearly distinct. In addition, we showed that the bias in CG dinucleotide frequency was conserved in human and mouse genomes. We propose that this sequence feature will serve as a useful classifier in transcript classification pipelines. We also suggest that our refined database of ‘bona fide’ lncRNAs will be valuable for the discovery of other sequence characteristics distinct to lncRNAs.
first_indexed 2024-04-12T05:15:28Z
format Article
id doaj.art-fca375387c184f06952888cbc77bdb02
institution Directory Open Access Journal
issn 1664-8021
language English
last_indexed 2024-04-12T05:15:28Z
publishDate 2014-09-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Genetics
spelling doaj.art-fca375387c184f06952888cbc77bdb022022-12-22T03:46:39ZengFrontiers Media S.A.Frontiers in Genetics1664-80212014-09-01510.3389/fgene.2014.00316107374Identification of a dinucleotide signature that discriminates coding from non-coding long RNAsDamien eUlveling0Marcel E Dinger1Claire eFrancastel2Florent eHubé3UMR7216 Epigenetics and Cell FateThe University of Queensland Diamantina InstituteUMR7216 Epigenetics and Cell FateUMR7216 Epigenetics and Cell FateTo date, the main criterion by which long ncRNAs (lncRNAs) are discriminated from mRNAs is based on the capacity of the transcripts to encode a protein. However, it becomes important to identify non-ORF-based sequence characteristics that can be used to parse between ncRNAs and mRNAs. In this study, we first established an extremely selective workflow to define a highly refined database of lncRNAs which was used for comparison with mRNAs. Then using this highly selective collection of lncRNAs, we found the CG dinucleotide frequencies were clearly distinct. In addition, we showed that the bias in CG dinucleotide frequency was conserved in human and mouse genomes. We propose that this sequence feature will serve as a useful classifier in transcript classification pipelines. We also suggest that our refined database of ‘bona fide’ lncRNAs will be valuable for the discovery of other sequence characteristics distinct to lncRNAs.http://journal.frontiersin.org/Journal/10.3389/fgene.2014.00316/fulldatabasemRNAncRNAexonpseudogeneintron
spellingShingle Damien eUlveling
Marcel E Dinger
Claire eFrancastel
Florent eHubé
Identification of a dinucleotide signature that discriminates coding from non-coding long RNAs
Frontiers in Genetics
database
mRNA
ncRNA
exon
pseudogene
intron
title Identification of a dinucleotide signature that discriminates coding from non-coding long RNAs
title_full Identification of a dinucleotide signature that discriminates coding from non-coding long RNAs
title_fullStr Identification of a dinucleotide signature that discriminates coding from non-coding long RNAs
title_full_unstemmed Identification of a dinucleotide signature that discriminates coding from non-coding long RNAs
title_short Identification of a dinucleotide signature that discriminates coding from non-coding long RNAs
title_sort identification of a dinucleotide signature that discriminates coding from non coding long rnas
topic database
mRNA
ncRNA
exon
pseudogene
intron
url http://journal.frontiersin.org/Journal/10.3389/fgene.2014.00316/full
work_keys_str_mv AT damieneulveling identificationofadinucleotidesignaturethatdiscriminatescodingfromnoncodinglongrnas
AT marceledinger identificationofadinucleotidesignaturethatdiscriminatescodingfromnoncodinglongrnas
AT claireefrancastel identificationofadinucleotidesignaturethatdiscriminatescodingfromnoncodinglongrnas
AT florentehube identificationofadinucleotidesignaturethatdiscriminatescodingfromnoncodinglongrnas