FLOating-Window Projective Separator (FloWPS): A Data Trimming Tool for Support Vector Machines (SVM) to Improve Robustness of the Classifier
Here, we propose a heuristic technique of data trimming for SVM termed FLOating Window Projective Separator (FloWPS), tailored for personalized predictions based on molecular data. This procedure can operate with high throughput genetic datasets like gene expression or mutation profiles. Its applica...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2019-01-01
|
Series: | Frontiers in Genetics |
Subjects: | |
Online Access: | https://www.frontiersin.org/article/10.3389/fgene.2018.00717/full |
_version_ | 1818020168009252864 |
---|---|
author | Victor Tkachev Maxim Sorokin Maxim Sorokin Artem Mescheryakov Alexander Simonov Andrew Garazha Anton Buzdin Anton Buzdin Anton Buzdin Ilya Muchnik Nicolas Borisov Nicolas Borisov |
author_facet | Victor Tkachev Maxim Sorokin Maxim Sorokin Artem Mescheryakov Alexander Simonov Andrew Garazha Anton Buzdin Anton Buzdin Anton Buzdin Ilya Muchnik Nicolas Borisov Nicolas Borisov |
author_sort | Victor Tkachev |
collection | DOAJ |
description | Here, we propose a heuristic technique of data trimming for SVM termed FLOating Window Projective Separator (FloWPS), tailored for personalized predictions based on molecular data. This procedure can operate with high throughput genetic datasets like gene expression or mutation profiles. Its application prevents SVM from extrapolation by excluding non-informative features. FloWPS requires training on the data for the individuals with known clinical outcomes to create a clinically relevant classifier. The genetic profiles linked with the outcomes are broken as usual into the training and validation datasets. The unique property of FloWPS is that irrelevant features in validation dataset that don’t have significant number of neighboring hits in the training dataset are removed from further analyses. Next, similarly to the k nearest neighbors (kNN) method, for each point of a validation dataset, FloWPS takes into account only the proximal points of the training dataset. Thus, for every point of a validation dataset, the training dataset is adjusted to form a floating window. FloWPS performance was tested on ten gene expression datasets for 992 cancer patients either responding or not on the different types of chemotherapy. We experimentally confirmed by leave-one-out cross-validation that FloWPS enables to significantly increase quality of a classifier built based on the classical SVM in most of the applications, particularly for polynomial kernels. |
first_indexed | 2024-04-14T08:02:20Z |
format | Article |
id | doaj.art-a9f04f9cb801495fb4d75d7ba1c5f01f |
institution | Directory Open Access Journal |
issn | 1664-8021 |
language | English |
last_indexed | 2024-04-14T08:02:20Z |
publishDate | 2019-01-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Genetics |
spelling | doaj.art-a9f04f9cb801495fb4d75d7ba1c5f01f2022-12-22T02:04:52ZengFrontiers Media S.A.Frontiers in Genetics1664-80212019-01-01910.3389/fgene.2018.00717422738FLOating-Window Projective Separator (FloWPS): A Data Trimming Tool for Support Vector Machines (SVM) to Improve Robustness of the ClassifierVictor Tkachev0Maxim Sorokin1Maxim Sorokin2Artem Mescheryakov3Alexander Simonov4Andrew Garazha5Anton Buzdin6Anton Buzdin7Anton Buzdin8Ilya Muchnik9Nicolas Borisov10Nicolas Borisov11Department of Bioinformatics and Molecular Networks, OmicsWay Corporation, Walnut, CA, United StatesDepartment of Bioinformatics and Molecular Networks, OmicsWay Corporation, Walnut, CA, United StatesShemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, RussiaYandex N.V. Corporation, Moscow, RussiaDepartment of Bioinformatics and Molecular Networks, OmicsWay Corporation, Walnut, CA, United StatesDepartment of Bioinformatics and Molecular Networks, OmicsWay Corporation, Walnut, CA, United StatesDepartment of Bioinformatics and Molecular Networks, OmicsWay Corporation, Walnut, CA, United StatesShemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, RussiaI.M. Sechenov First Moscow State Medical University (Sechenov University), Moscow, RussiaHill Center, Rutgers University, Piscataway, NJ, United StatesDepartment of Bioinformatics and Molecular Networks, OmicsWay Corporation, Walnut, CA, United StatesI.M. Sechenov First Moscow State Medical University (Sechenov University), Moscow, RussiaHere, we propose a heuristic technique of data trimming for SVM termed FLOating Window Projective Separator (FloWPS), tailored for personalized predictions based on molecular data. This procedure can operate with high throughput genetic datasets like gene expression or mutation profiles. Its application prevents SVM from extrapolation by excluding non-informative features. FloWPS requires training on the data for the individuals with known clinical outcomes to create a clinically relevant classifier. The genetic profiles linked with the outcomes are broken as usual into the training and validation datasets. The unique property of FloWPS is that irrelevant features in validation dataset that don’t have significant number of neighboring hits in the training dataset are removed from further analyses. Next, similarly to the k nearest neighbors (kNN) method, for each point of a validation dataset, FloWPS takes into account only the proximal points of the training dataset. Thus, for every point of a validation dataset, the training dataset is adjusted to form a floating window. FloWPS performance was tested on ten gene expression datasets for 992 cancer patients either responding or not on the different types of chemotherapy. We experimentally confirmed by leave-one-out cross-validation that FloWPS enables to significantly increase quality of a classifier built based on the classical SVM in most of the applications, particularly for polynomial kernels.https://www.frontiersin.org/article/10.3389/fgene.2018.00717/fullbioinformaticsmachine learningoncologygene expressionsupport vector machinespersonalized medicine |
spellingShingle | Victor Tkachev Maxim Sorokin Maxim Sorokin Artem Mescheryakov Alexander Simonov Andrew Garazha Anton Buzdin Anton Buzdin Anton Buzdin Ilya Muchnik Nicolas Borisov Nicolas Borisov FLOating-Window Projective Separator (FloWPS): A Data Trimming Tool for Support Vector Machines (SVM) to Improve Robustness of the Classifier Frontiers in Genetics bioinformatics machine learning oncology gene expression support vector machines personalized medicine |
title | FLOating-Window Projective Separator (FloWPS): A Data Trimming Tool for Support Vector Machines (SVM) to Improve Robustness of the Classifier |
title_full | FLOating-Window Projective Separator (FloWPS): A Data Trimming Tool for Support Vector Machines (SVM) to Improve Robustness of the Classifier |
title_fullStr | FLOating-Window Projective Separator (FloWPS): A Data Trimming Tool for Support Vector Machines (SVM) to Improve Robustness of the Classifier |
title_full_unstemmed | FLOating-Window Projective Separator (FloWPS): A Data Trimming Tool for Support Vector Machines (SVM) to Improve Robustness of the Classifier |
title_short | FLOating-Window Projective Separator (FloWPS): A Data Trimming Tool for Support Vector Machines (SVM) to Improve Robustness of the Classifier |
title_sort | floating window projective separator flowps a data trimming tool for support vector machines svm to improve robustness of the classifier |
topic | bioinformatics machine learning oncology gene expression support vector machines personalized medicine |
url | https://www.frontiersin.org/article/10.3389/fgene.2018.00717/full |
work_keys_str_mv | AT victortkachev floatingwindowprojectiveseparatorflowpsadatatrimmingtoolforsupportvectormachinessvmtoimproverobustnessoftheclassifier AT maximsorokin floatingwindowprojectiveseparatorflowpsadatatrimmingtoolforsupportvectormachinessvmtoimproverobustnessoftheclassifier AT maximsorokin floatingwindowprojectiveseparatorflowpsadatatrimmingtoolforsupportvectormachinessvmtoimproverobustnessoftheclassifier AT artemmescheryakov floatingwindowprojectiveseparatorflowpsadatatrimmingtoolforsupportvectormachinessvmtoimproverobustnessoftheclassifier AT alexandersimonov floatingwindowprojectiveseparatorflowpsadatatrimmingtoolforsupportvectormachinessvmtoimproverobustnessoftheclassifier AT andrewgarazha floatingwindowprojectiveseparatorflowpsadatatrimmingtoolforsupportvectormachinessvmtoimproverobustnessoftheclassifier AT antonbuzdin floatingwindowprojectiveseparatorflowpsadatatrimmingtoolforsupportvectormachinessvmtoimproverobustnessoftheclassifier AT antonbuzdin floatingwindowprojectiveseparatorflowpsadatatrimmingtoolforsupportvectormachinessvmtoimproverobustnessoftheclassifier AT antonbuzdin floatingwindowprojectiveseparatorflowpsadatatrimmingtoolforsupportvectormachinessvmtoimproverobustnessoftheclassifier AT ilyamuchnik floatingwindowprojectiveseparatorflowpsadatatrimmingtoolforsupportvectormachinessvmtoimproverobustnessoftheclassifier AT nicolasborisov floatingwindowprojectiveseparatorflowpsadatatrimmingtoolforsupportvectormachinessvmtoimproverobustnessoftheclassifier AT nicolasborisov floatingwindowprojectiveseparatorflowpsadatatrimmingtoolforsupportvectormachinessvmtoimproverobustnessoftheclassifier |