sigFeature: an R-package for significant feature selection using SVM-RFE and t-statistic

Depending on the sub-site of the primary tumour, up to thirty percent of the patients with clinical and radiological node negative HNSCC may have occult metastases. Therefore, currently, up to seventy percent patients with node negative neck disease receive unnecessary therapy to ensure a minority w...

Full description

Bibliographic Details
Main Authors: Pijush Das, Susanta Roychoudhury, Sucheta Tripathy
Format: Article
Language:English
Published: Science Planet Inc. 2017-10-01
Series:Canadian Journal of Biotechnology
Online Access:https://www.canadianjbiotech.com/CAN_J_BIOTECH/Archives/v1/Special Issue/cjb.2017-a22.pdf
_version_ 1811256756324532224
author Pijush Das
Susanta Roychoudhury
Sucheta Tripathy
author_facet Pijush Das
Susanta Roychoudhury
Sucheta Tripathy
author_sort Pijush Das
collection DOAJ
description Depending on the sub-site of the primary tumour, up to thirty percent of the patients with clinical and radiological node negative HNSCC may have occult metastases. Therefore, currently, up to seventy percent patients with node negative neck disease receive unnecessary therapy to ensure a minority who are truly at risk [1]. The treatment of HNSCC involves surgery, radiotherapy or multimodality therapy like surgery together with adjuvant radiotherapy or chemo radiotherapy. HNSCC is typically considered as a homogeneous tumour group, i.e., histopathologically identical, but they are often genetically disparate and exhibit variable biological behaviour and response to treatment between and within anatomical sub-sites [2]. Currently, treatment decisions for patients with HNSCC are still based on clinical, radiological and pathologic parameters. No molecular markers are used for treatment decision, except in ongoing research protocols. To identify those patients who are truly at risk, a novel feature selection method has been introduced based on expressional genomic data in this study. In data mining, feature selection is an extremely dynamic field of research for classification in the field of machine learning technology. The aim of feature selection is to select a small subset of a feature from a larger pool, rendering not only a good performance of classification but also biologically meaningful insights. Filter methods e.g. the support vector machine recursive feature elimination (SVM-RFE) is recognised as one of the most effective methods. The RFE-SVM algorithm is a greedy method that only hopes to find the best possible combination for classification without considering the differentially significant feature between the classes. To overcome this limitation of SVM-RFE, our proposed approach which is based on RFE-SVM and t-statistic is to find out differentially significant features along with the good performance of classification. The experimental results which we obtained after analysing six publicly available micro array datasets are very promising and show the contribution in feature selection in machine learning technology. The main conclusion is that the selected features are differentially significant between the classes and able to produce good classification accuracy which will help further downstream analysis for strengthening the biological aspect.
first_indexed 2024-04-12T17:46:08Z
format Article
id doaj.art-57a39b83b879424298697bf9236d0926
institution Directory Open Access Journal
issn 2560-8304
language English
last_indexed 2024-04-12T17:46:08Z
publishDate 2017-10-01
publisher Science Planet Inc.
record_format Article
series Canadian Journal of Biotechnology
spelling doaj.art-57a39b83b879424298697bf9236d09262022-12-22T03:22:40ZengScience Planet Inc.Canadian Journal of Biotechnology2560-83042017-10-011Special Issue353510.24870/cjb.2017-a22sigFeature: an R-package for significant feature selection using SVM-RFE and t-statisticPijush Das0Susanta Roychoudhury1Sucheta Tripathy2Computational Genomics lab, Structural Biology and Bioinformatics Division, CSIR- Indian Institute of Chemical Biology; Kolkata 700032, INDIAComputational Genomics lab, Structural Biology and Bioinformatics Division, CSIR- Indian Institute of Chemical Biology; Kolkata 700032, INDIAComputational Genomics lab, Structural Biology and Bioinformatics Division, CSIR- Indian Institute of Chemical Biology; Kolkata 700032, INDIADepending on the sub-site of the primary tumour, up to thirty percent of the patients with clinical and radiological node negative HNSCC may have occult metastases. Therefore, currently, up to seventy percent patients with node negative neck disease receive unnecessary therapy to ensure a minority who are truly at risk [1]. The treatment of HNSCC involves surgery, radiotherapy or multimodality therapy like surgery together with adjuvant radiotherapy or chemo radiotherapy. HNSCC is typically considered as a homogeneous tumour group, i.e., histopathologically identical, but they are often genetically disparate and exhibit variable biological behaviour and response to treatment between and within anatomical sub-sites [2]. Currently, treatment decisions for patients with HNSCC are still based on clinical, radiological and pathologic parameters. No molecular markers are used for treatment decision, except in ongoing research protocols. To identify those patients who are truly at risk, a novel feature selection method has been introduced based on expressional genomic data in this study. In data mining, feature selection is an extremely dynamic field of research for classification in the field of machine learning technology. The aim of feature selection is to select a small subset of a feature from a larger pool, rendering not only a good performance of classification but also biologically meaningful insights. Filter methods e.g. the support vector machine recursive feature elimination (SVM-RFE) is recognised as one of the most effective methods. The RFE-SVM algorithm is a greedy method that only hopes to find the best possible combination for classification without considering the differentially significant feature between the classes. To overcome this limitation of SVM-RFE, our proposed approach which is based on RFE-SVM and t-statistic is to find out differentially significant features along with the good performance of classification. The experimental results which we obtained after analysing six publicly available micro array datasets are very promising and show the contribution in feature selection in machine learning technology. The main conclusion is that the selected features are differentially significant between the classes and able to produce good classification accuracy which will help further downstream analysis for strengthening the biological aspect.https://www.canadianjbiotech.com/CAN_J_BIOTECH/Archives/v1/Special Issue/cjb.2017-a22.pdf
spellingShingle Pijush Das
Susanta Roychoudhury
Sucheta Tripathy
sigFeature: an R-package for significant feature selection using SVM-RFE and t-statistic
Canadian Journal of Biotechnology
title sigFeature: an R-package for significant feature selection using SVM-RFE and t-statistic
title_full sigFeature: an R-package for significant feature selection using SVM-RFE and t-statistic
title_fullStr sigFeature: an R-package for significant feature selection using SVM-RFE and t-statistic
title_full_unstemmed sigFeature: an R-package for significant feature selection using SVM-RFE and t-statistic
title_short sigFeature: an R-package for significant feature selection using SVM-RFE and t-statistic
title_sort sigfeature an r package for significant feature selection using svm rfe and t statistic
url https://www.canadianjbiotech.com/CAN_J_BIOTECH/Archives/v1/Special Issue/cjb.2017-a22.pdf
work_keys_str_mv AT pijushdas sigfeatureanrpackageforsignificantfeatureselectionusingsvmrfeandtstatistic
AT susantaroychoudhury sigfeatureanrpackageforsignificantfeatureselectionusingsvmrfeandtstatistic
AT suchetatripathy sigfeatureanrpackageforsignificantfeatureselectionusingsvmrfeandtstatistic