The impact of feature selection on one and two-class classification performance for plant microRNAs
MicroRNAs (miRNAs) are short nucleotide sequences that form a typical hairpin structure which is recognized by a complex enzyme machinery. It ultimately leads to the incorporation of 18–24 nt long mature miRNAs into RISC where they act as recognition keys to aid in regulation of target mRNAs. It is...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
PeerJ Inc.
2016-06-01
|
Series: | PeerJ |
Subjects: | |
Online Access: | https://peerj.com/articles/2135.pdf |
_version_ | 1827607717279170560 |
---|---|
author | Waleed Khalifa Malik Yousef Müşerref Duygu Saçar Demirci Jens Allmer |
author_facet | Waleed Khalifa Malik Yousef Müşerref Duygu Saçar Demirci Jens Allmer |
author_sort | Waleed Khalifa |
collection | DOAJ |
description | MicroRNAs (miRNAs) are short nucleotide sequences that form a typical hairpin structure which is recognized by a complex enzyme machinery. It ultimately leads to the incorporation of 18–24 nt long mature miRNAs into RISC where they act as recognition keys to aid in regulation of target mRNAs. It is involved to determine miRNAs experimentally and, therefore, machine learning is used to complement such endeavors. The success of machine learning mostly depends on proper input data and appropriate features for parameterization of the data. Although, in general, two-class classification (TCC) is used in the field; because negative examples are hard to come by, one-class classification (OCC) has been tried for pre-miRNA detection. Since both positive and negative examples are currently somewhat limited, feature selection can prove to be vital for furthering the field of pre-miRNA detection. In this study, we compare the performance of OCC and TCC using eight feature selection methods and seven different plant species providing positive pre-miRNA examples. Feature selection was very successful for OCC where the best feature selection method achieved an average accuracy of 95.6%, thereby being ∼29% better than the worst method which achieved 66.9% accuracy. While the performance is comparable to TCC, which performs up to 3% better than OCC, TCC is much less affected by feature selection and its largest performance gap is ∼13% which only occurs for two of the feature selection methodologies. We conclude that feature selection is crucially important for OCC and that it can perform on par with TCC given the proper set of features. |
first_indexed | 2024-03-09T06:59:12Z |
format | Article |
id | doaj.art-7ef638345c9742928f65562cf965ec1b |
institution | Directory Open Access Journal |
issn | 2167-8359 |
language | English |
last_indexed | 2024-03-09T06:59:12Z |
publishDate | 2016-06-01 |
publisher | PeerJ Inc. |
record_format | Article |
series | PeerJ |
spelling | doaj.art-7ef638345c9742928f65562cf965ec1b2023-12-03T09:56:54ZengPeerJ Inc.PeerJ2167-83592016-06-014e213510.7717/peerj.2135The impact of feature selection on one and two-class classification performance for plant microRNAsWaleed Khalifa0Malik Yousef1Müşerref Duygu Saçar Demirci2Jens Allmer3Computer Science, The College of Sakhnin, Sakhnin, IsraelComputer Science, The College of Sakhnin, Sakhnin, IsraelMolecular Biology and Genetics, Izmir Institute of Technology, Urla, Izmir, TurkeyMolecular Biology and Genetics, Izmir Institute of Technology, Urla, Izmir, TurkeyMicroRNAs (miRNAs) are short nucleotide sequences that form a typical hairpin structure which is recognized by a complex enzyme machinery. It ultimately leads to the incorporation of 18–24 nt long mature miRNAs into RISC where they act as recognition keys to aid in regulation of target mRNAs. It is involved to determine miRNAs experimentally and, therefore, machine learning is used to complement such endeavors. The success of machine learning mostly depends on proper input data and appropriate features for parameterization of the data. Although, in general, two-class classification (TCC) is used in the field; because negative examples are hard to come by, one-class classification (OCC) has been tried for pre-miRNA detection. Since both positive and negative examples are currently somewhat limited, feature selection can prove to be vital for furthering the field of pre-miRNA detection. In this study, we compare the performance of OCC and TCC using eight feature selection methods and seven different plant species providing positive pre-miRNA examples. Feature selection was very successful for OCC where the best feature selection method achieved an average accuracy of 95.6%, thereby being ∼29% better than the worst method which achieved 66.9% accuracy. While the performance is comparable to TCC, which performs up to 3% better than OCC, TCC is much less affected by feature selection and its largest performance gap is ∼13% which only occurs for two of the feature selection methodologies. We conclude that feature selection is crucially important for OCC and that it can perform on par with TCC given the proper set of features.https://peerj.com/articles/2135.pdfMicroRNAMachine learningFeature selectionPlantOne-class classificationTwo-class classification |
spellingShingle | Waleed Khalifa Malik Yousef Müşerref Duygu Saçar Demirci Jens Allmer The impact of feature selection on one and two-class classification performance for plant microRNAs PeerJ MicroRNA Machine learning Feature selection Plant One-class classification Two-class classification |
title | The impact of feature selection on one and two-class classification performance for plant microRNAs |
title_full | The impact of feature selection on one and two-class classification performance for plant microRNAs |
title_fullStr | The impact of feature selection on one and two-class classification performance for plant microRNAs |
title_full_unstemmed | The impact of feature selection on one and two-class classification performance for plant microRNAs |
title_short | The impact of feature selection on one and two-class classification performance for plant microRNAs |
title_sort | impact of feature selection on one and two class classification performance for plant micrornas |
topic | MicroRNA Machine learning Feature selection Plant One-class classification Two-class classification |
url | https://peerj.com/articles/2135.pdf |
work_keys_str_mv | AT waleedkhalifa theimpactoffeatureselectionononeandtwoclassclassificationperformanceforplantmicrornas AT malikyousef theimpactoffeatureselectionononeandtwoclassclassificationperformanceforplantmicrornas AT muserrefduygusacardemirci theimpactoffeatureselectionononeandtwoclassclassificationperformanceforplantmicrornas AT jensallmer theimpactoffeatureselectionononeandtwoclassclassificationperformanceforplantmicrornas AT waleedkhalifa impactoffeatureselectionononeandtwoclassclassificationperformanceforplantmicrornas AT malikyousef impactoffeatureselectionononeandtwoclassclassificationperformanceforplantmicrornas AT muserrefduygusacardemirci impactoffeatureselectionononeandtwoclassclassificationperformanceforplantmicrornas AT jensallmer impactoffeatureselectionononeandtwoclassclassificationperformanceforplantmicrornas |