The impact of feature selection on one and two-class classification performance for plant microRNAs

MicroRNAs (miRNAs) are short nucleotide sequences that form a typical hairpin structure which is recognized by a complex enzyme machinery. It ultimately leads to the incorporation of 18–24 nt long mature miRNAs into RISC where they act as recognition keys to aid in regulation of target mRNAs. It is...

Full description

Bibliographic Details
Main Authors: Waleed Khalifa, Malik Yousef, Müşerref Duygu Saçar Demirci, Jens Allmer
Format: Article
Language:English
Published: PeerJ Inc. 2016-06-01
Series:PeerJ
Subjects:
Online Access:https://peerj.com/articles/2135.pdf
_version_ 1827607717279170560
author Waleed Khalifa
Malik Yousef
Müşerref Duygu Saçar Demirci
Jens Allmer
author_facet Waleed Khalifa
Malik Yousef
Müşerref Duygu Saçar Demirci
Jens Allmer
author_sort Waleed Khalifa
collection DOAJ
description MicroRNAs (miRNAs) are short nucleotide sequences that form a typical hairpin structure which is recognized by a complex enzyme machinery. It ultimately leads to the incorporation of 18–24 nt long mature miRNAs into RISC where they act as recognition keys to aid in regulation of target mRNAs. It is involved to determine miRNAs experimentally and, therefore, machine learning is used to complement such endeavors. The success of machine learning mostly depends on proper input data and appropriate features for parameterization of the data. Although, in general, two-class classification (TCC) is used in the field; because negative examples are hard to come by, one-class classification (OCC) has been tried for pre-miRNA detection. Since both positive and negative examples are currently somewhat limited, feature selection can prove to be vital for furthering the field of pre-miRNA detection. In this study, we compare the performance of OCC and TCC using eight feature selection methods and seven different plant species providing positive pre-miRNA examples. Feature selection was very successful for OCC where the best feature selection method achieved an average accuracy of 95.6%, thereby being ∼29% better than the worst method which achieved 66.9% accuracy. While the performance is comparable to TCC, which performs up to 3% better than OCC, TCC is much less affected by feature selection and its largest performance gap is ∼13% which only occurs for two of the feature selection methodologies. We conclude that feature selection is crucially important for OCC and that it can perform on par with TCC given the proper set of features.
first_indexed 2024-03-09T06:59:12Z
format Article
id doaj.art-7ef638345c9742928f65562cf965ec1b
institution Directory Open Access Journal
issn 2167-8359
language English
last_indexed 2024-03-09T06:59:12Z
publishDate 2016-06-01
publisher PeerJ Inc.
record_format Article
series PeerJ
spelling doaj.art-7ef638345c9742928f65562cf965ec1b2023-12-03T09:56:54ZengPeerJ Inc.PeerJ2167-83592016-06-014e213510.7717/peerj.2135The impact of feature selection on one and two-class classification performance for plant microRNAsWaleed Khalifa0Malik Yousef1Müşerref Duygu Saçar Demirci2Jens Allmer3Computer Science, The College of Sakhnin, Sakhnin, IsraelComputer Science, The College of Sakhnin, Sakhnin, IsraelMolecular Biology and Genetics, Izmir Institute of Technology, Urla, Izmir, TurkeyMolecular Biology and Genetics, Izmir Institute of Technology, Urla, Izmir, TurkeyMicroRNAs (miRNAs) are short nucleotide sequences that form a typical hairpin structure which is recognized by a complex enzyme machinery. It ultimately leads to the incorporation of 18–24 nt long mature miRNAs into RISC where they act as recognition keys to aid in regulation of target mRNAs. It is involved to determine miRNAs experimentally and, therefore, machine learning is used to complement such endeavors. The success of machine learning mostly depends on proper input data and appropriate features for parameterization of the data. Although, in general, two-class classification (TCC) is used in the field; because negative examples are hard to come by, one-class classification (OCC) has been tried for pre-miRNA detection. Since both positive and negative examples are currently somewhat limited, feature selection can prove to be vital for furthering the field of pre-miRNA detection. In this study, we compare the performance of OCC and TCC using eight feature selection methods and seven different plant species providing positive pre-miRNA examples. Feature selection was very successful for OCC where the best feature selection method achieved an average accuracy of 95.6%, thereby being ∼29% better than the worst method which achieved 66.9% accuracy. While the performance is comparable to TCC, which performs up to 3% better than OCC, TCC is much less affected by feature selection and its largest performance gap is ∼13% which only occurs for two of the feature selection methodologies. We conclude that feature selection is crucially important for OCC and that it can perform on par with TCC given the proper set of features.https://peerj.com/articles/2135.pdfMicroRNAMachine learningFeature selectionPlantOne-class classificationTwo-class classification
spellingShingle Waleed Khalifa
Malik Yousef
Müşerref Duygu Saçar Demirci
Jens Allmer
The impact of feature selection on one and two-class classification performance for plant microRNAs
PeerJ
MicroRNA
Machine learning
Feature selection
Plant
One-class classification
Two-class classification
title The impact of feature selection on one and two-class classification performance for plant microRNAs
title_full The impact of feature selection on one and two-class classification performance for plant microRNAs
title_fullStr The impact of feature selection on one and two-class classification performance for plant microRNAs
title_full_unstemmed The impact of feature selection on one and two-class classification performance for plant microRNAs
title_short The impact of feature selection on one and two-class classification performance for plant microRNAs
title_sort impact of feature selection on one and two class classification performance for plant micrornas
topic MicroRNA
Machine learning
Feature selection
Plant
One-class classification
Two-class classification
url https://peerj.com/articles/2135.pdf
work_keys_str_mv AT waleedkhalifa theimpactoffeatureselectionononeandtwoclassclassificationperformanceforplantmicrornas
AT malikyousef theimpactoffeatureselectionononeandtwoclassclassificationperformanceforplantmicrornas
AT muserrefduygusacardemirci theimpactoffeatureselectionononeandtwoclassclassificationperformanceforplantmicrornas
AT jensallmer theimpactoffeatureselectionononeandtwoclassclassificationperformanceforplantmicrornas
AT waleedkhalifa impactoffeatureselectionononeandtwoclassclassificationperformanceforplantmicrornas
AT malikyousef impactoffeatureselectionononeandtwoclassclassificationperformanceforplantmicrornas
AT muserrefduygusacardemirci impactoffeatureselectionononeandtwoclassclassificationperformanceforplantmicrornas
AT jensallmer impactoffeatureselectionononeandtwoclassclassificationperformanceforplantmicrornas