NCodR: A multi-class support vector machine classification to distinguish non-coding RNAs in Viridiplantae

Non-coding RNAs (ncRNAs) are major players in the regulation of gene expression. This study analyses seven classes of ncRNAs in plants using sequence and secondary structure-based RNA folding measures. We observe distinct regions in the distribution of AU content along with overlapping regions for d...

Full description

Bibliographic Details
Main Authors: Chandran Nithin, Sunandan Mukherjee, Jolly Basak, Ranjit Prasad Bahadur
Format: Article
Language:English
Published: Cambridge University Press 2022-01-01
Series:Quantitative Plant Biology
Subjects:
Online Access:https://www.cambridge.org/core/product/identifier/S2632882822000182/type/journal_article
_version_ 1811155724028346368
author Chandran Nithin
Sunandan Mukherjee
Jolly Basak
Ranjit Prasad Bahadur
author_facet Chandran Nithin
Sunandan Mukherjee
Jolly Basak
Ranjit Prasad Bahadur
author_sort Chandran Nithin
collection DOAJ
description Non-coding RNAs (ncRNAs) are major players in the regulation of gene expression. This study analyses seven classes of ncRNAs in plants using sequence and secondary structure-based RNA folding measures. We observe distinct regions in the distribution of AU content along with overlapping regions for different ncRNA classes. Additionally, we find similar averages for minimum folding energy index across various ncRNAs classes except for pre-miRNAs and lncRNAs. Various RNA folding measures show similar trends among the different ncRNA classes except for pre-miRNAs and lncRNAs. We observe different k-mer repeat signatures of length three among various ncRNA classes. However, in pre-miRs and lncRNAs, a diffuse pattern of k-mers is observed. Using these attributes, we train eight different classifiers to discriminate various ncRNA classes in plants. Support vector machines employing radial basis function show the highest accuracy (average F1 of ~96%) in discriminating ncRNAs, and the classifier is implemented as a web server, NCodR.
first_indexed 2024-04-10T04:38:22Z
format Article
id doaj.art-2d7d4f70adf74558b41c9b24abcddcd6
institution Directory Open Access Journal
issn 2632-8828
language English
last_indexed 2024-04-10T04:38:22Z
publishDate 2022-01-01
publisher Cambridge University Press
record_format Article
series Quantitative Plant Biology
spelling doaj.art-2d7d4f70adf74558b41c9b24abcddcd62023-03-09T12:43:35ZengCambridge University PressQuantitative Plant Biology2632-88282022-01-01310.1017/qpb.2022.18NCodR: A multi-class support vector machine classification to distinguish non-coding RNAs in ViridiplantaeChandran Nithin0https://orcid.org/0000-0001-8212-6093Sunandan Mukherjee1https://orcid.org/0000-0002-4361-0103Jolly Basak2Ranjit Prasad Bahadur3https://orcid.org/0000-0002-6705-1713Computational Structural Biology Lab, Department of Biotechnology, Indian Institute of Technology, Kharagpur 721302, India Laboratory of Computational Biology, Faculty of Chemistry, Biological and Chemical Research Centre, University of Warsaw, 02-089 Warsaw, PolandComputational Structural Biology Lab, Department of Biotechnology, Indian Institute of Technology, Kharagpur 721302, India Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, PL-02-109 Warsaw, PolandDepartment of Biotechnology, Visva-Bharati, Santiniketan, 731235, IndiaComputational Structural Biology Lab, Department of Biotechnology, Indian Institute of Technology, Kharagpur 721302, IndiaNon-coding RNAs (ncRNAs) are major players in the regulation of gene expression. This study analyses seven classes of ncRNAs in plants using sequence and secondary structure-based RNA folding measures. We observe distinct regions in the distribution of AU content along with overlapping regions for different ncRNA classes. Additionally, we find similar averages for minimum folding energy index across various ncRNAs classes except for pre-miRNAs and lncRNAs. Various RNA folding measures show similar trends among the different ncRNA classes except for pre-miRNAs and lncRNAs. We observe different k-mer repeat signatures of length three among various ncRNA classes. However, in pre-miRs and lncRNAs, a diffuse pattern of k-mers is observed. Using these attributes, we train eight different classifiers to discriminate various ncRNA classes in plants. Support vector machines employing radial basis function show the highest accuracy (average F1 of ~96%) in discriminating ncRNAs, and the classifier is implemented as a web server, NCodR.https://www.cambridge.org/core/product/identifier/S2632882822000182/type/journal_articlek-mer repeatsncRNA predictionnon-coding RNARNA foldingSVM classifier
spellingShingle Chandran Nithin
Sunandan Mukherjee
Jolly Basak
Ranjit Prasad Bahadur
NCodR: A multi-class support vector machine classification to distinguish non-coding RNAs in Viridiplantae
Quantitative Plant Biology
k-mer repeats
ncRNA prediction
non-coding RNA
RNA folding
SVM classifier
title NCodR: A multi-class support vector machine classification to distinguish non-coding RNAs in Viridiplantae
title_full NCodR: A multi-class support vector machine classification to distinguish non-coding RNAs in Viridiplantae
title_fullStr NCodR: A multi-class support vector machine classification to distinguish non-coding RNAs in Viridiplantae
title_full_unstemmed NCodR: A multi-class support vector machine classification to distinguish non-coding RNAs in Viridiplantae
title_short NCodR: A multi-class support vector machine classification to distinguish non-coding RNAs in Viridiplantae
title_sort ncodr a multi class support vector machine classification to distinguish non coding rnas in viridiplantae
topic k-mer repeats
ncRNA prediction
non-coding RNA
RNA folding
SVM classifier
url https://www.cambridge.org/core/product/identifier/S2632882822000182/type/journal_article
work_keys_str_mv AT chandrannithin ncodramulticlasssupportvectormachineclassificationtodistinguishnoncodingrnasinviridiplantae
AT sunandanmukherjee ncodramulticlasssupportvectormachineclassificationtodistinguishnoncodingrnasinviridiplantae
AT jollybasak ncodramulticlasssupportvectormachineclassificationtodistinguishnoncodingrnasinviridiplantae
AT ranjitprasadbahadur ncodramulticlasssupportvectormachineclassificationtodistinguishnoncodingrnasinviridiplantae