Prediction of MicroRNA Precursors Using Parsimonious Feature Sets

MicroRNAs (miRNAs) are a class of short noncoding RNAs that regulate gene expression through base pairing with messenger RNAs. Due to the interest in studying miRNA dysregulation in disease and limits of validated miRNA references, identification of novel miRNAs is a critical task. The performance o...

Full description

Bibliographic Details
Main Authors: Petra Stepanowsky, Eric Levy, Jihoon Kim, Xiaoqian Jiang, Lucila Ohno-Machado
Format: Article
Language:English
Published: SAGE Publishing 2014-01-01
Series:Cancer Informatics
Online Access:https://doi.org/10.4137/CIN.S13877
_version_ 1818684123839987712
author Petra Stepanowsky
Eric Levy
Jihoon Kim
Xiaoqian Jiang
Lucila Ohno-Machado
author_facet Petra Stepanowsky
Eric Levy
Jihoon Kim
Xiaoqian Jiang
Lucila Ohno-Machado
author_sort Petra Stepanowsky
collection DOAJ
description MicroRNAs (miRNAs) are a class of short noncoding RNAs that regulate gene expression through base pairing with messenger RNAs. Due to the interest in studying miRNA dysregulation in disease and limits of validated miRNA references, identification of novel miRNAs is a critical task. The performance of different models to predict novel miRNAs varies with the features chosen as predictors. However, no study has systematically compared published feature sets. We constructed a comprehensive feature set using the minimum free energy of the secondary structure of precursor miRNAs, a set of nucleotide-structure triplets, and additional extracted sequence and structure characteristics. We then compared the predictive value of our comprehensive feature set to those from three previously published studies, using logistic regression and random forest classifiers. We found that classifiers containing as few as seven highly predictive features are able to predict novel precursor miRNAs as well as classifiers that use larger feature sets. In a real data set, our method correctly identified the holdout miRNAs relevant to renal cancer.
first_indexed 2024-12-17T10:45:38Z
format Article
id doaj.art-c1052aefecd84fdaa6fd305010abb499
institution Directory Open Access Journal
issn 1176-9351
language English
last_indexed 2024-12-17T10:45:38Z
publishDate 2014-01-01
publisher SAGE Publishing
record_format Article
series Cancer Informatics
spelling doaj.art-c1052aefecd84fdaa6fd305010abb4992022-12-21T21:52:08ZengSAGE PublishingCancer Informatics1176-93512014-01-0113s110.4137/CIN.S13877Prediction of MicroRNA Precursors Using Parsimonious Feature SetsPetra Stepanowsky0Eric Levy1Jihoon Kim2Xiaoqian Jiang3Lucila Ohno-Machado4Bioinformatics Research Group, University of Applied Sciences, Upper Austria, Hagenberg, Austria.Division of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA.Division of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA.Division of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA.Division of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA.MicroRNAs (miRNAs) are a class of short noncoding RNAs that regulate gene expression through base pairing with messenger RNAs. Due to the interest in studying miRNA dysregulation in disease and limits of validated miRNA references, identification of novel miRNAs is a critical task. The performance of different models to predict novel miRNAs varies with the features chosen as predictors. However, no study has systematically compared published feature sets. We constructed a comprehensive feature set using the minimum free energy of the secondary structure of precursor miRNAs, a set of nucleotide-structure triplets, and additional extracted sequence and structure characteristics. We then compared the predictive value of our comprehensive feature set to those from three previously published studies, using logistic regression and random forest classifiers. We found that classifiers containing as few as seven highly predictive features are able to predict novel precursor miRNAs as well as classifiers that use larger feature sets. In a real data set, our method correctly identified the holdout miRNAs relevant to renal cancer.https://doi.org/10.4137/CIN.S13877
spellingShingle Petra Stepanowsky
Eric Levy
Jihoon Kim
Xiaoqian Jiang
Lucila Ohno-Machado
Prediction of MicroRNA Precursors Using Parsimonious Feature Sets
Cancer Informatics
title Prediction of MicroRNA Precursors Using Parsimonious Feature Sets
title_full Prediction of MicroRNA Precursors Using Parsimonious Feature Sets
title_fullStr Prediction of MicroRNA Precursors Using Parsimonious Feature Sets
title_full_unstemmed Prediction of MicroRNA Precursors Using Parsimonious Feature Sets
title_short Prediction of MicroRNA Precursors Using Parsimonious Feature Sets
title_sort prediction of microrna precursors using parsimonious feature sets
url https://doi.org/10.4137/CIN.S13877
work_keys_str_mv AT petrastepanowsky predictionofmicrornaprecursorsusingparsimoniousfeaturesets
AT ericlevy predictionofmicrornaprecursorsusingparsimoniousfeaturesets
AT jihoonkim predictionofmicrornaprecursorsusingparsimoniousfeaturesets
AT xiaoqianjiang predictionofmicrornaprecursorsusingparsimoniousfeaturesets
AT lucilaohnomachado predictionofmicrornaprecursorsusingparsimoniousfeaturesets