FLOating-Window Projective Separator (FloWPS): A Data Trimming Tool for Support Vector Machines (SVM) to Improve Robustness of the Classifier

Here, we propose a heuristic technique of data trimming for SVM termed FLOating Window Projective Separator (FloWPS), tailored for personalized predictions based on molecular data. This procedure can operate with high throughput genetic datasets like gene expression or mutation profiles. Its applica...

Full description

Bibliographic Details
Main Authors: Victor Tkachev, Maxim Sorokin, Artem Mescheryakov, Alexander Simonov, Andrew Garazha, Anton Buzdin, Ilya Muchnik, Nicolas Borisov
Format: Article
Language:English
Published: Frontiers Media S.A. 2019-01-01
Series:Frontiers in Genetics
Subjects:
Online Access:https://www.frontiersin.org/article/10.3389/fgene.2018.00717/full
_version_ 1818020168009252864
author Victor Tkachev
Maxim Sorokin
Maxim Sorokin
Artem Mescheryakov
Alexander Simonov
Andrew Garazha
Anton Buzdin
Anton Buzdin
Anton Buzdin
Ilya Muchnik
Nicolas Borisov
Nicolas Borisov
author_facet Victor Tkachev
Maxim Sorokin
Maxim Sorokin
Artem Mescheryakov
Alexander Simonov
Andrew Garazha
Anton Buzdin
Anton Buzdin
Anton Buzdin
Ilya Muchnik
Nicolas Borisov
Nicolas Borisov
author_sort Victor Tkachev
collection DOAJ
description Here, we propose a heuristic technique of data trimming for SVM termed FLOating Window Projective Separator (FloWPS), tailored for personalized predictions based on molecular data. This procedure can operate with high throughput genetic datasets like gene expression or mutation profiles. Its application prevents SVM from extrapolation by excluding non-informative features. FloWPS requires training on the data for the individuals with known clinical outcomes to create a clinically relevant classifier. The genetic profiles linked with the outcomes are broken as usual into the training and validation datasets. The unique property of FloWPS is that irrelevant features in validation dataset that don’t have significant number of neighboring hits in the training dataset are removed from further analyses. Next, similarly to the k nearest neighbors (kNN) method, for each point of a validation dataset, FloWPS takes into account only the proximal points of the training dataset. Thus, for every point of a validation dataset, the training dataset is adjusted to form a floating window. FloWPS performance was tested on ten gene expression datasets for 992 cancer patients either responding or not on the different types of chemotherapy. We experimentally confirmed by leave-one-out cross-validation that FloWPS enables to significantly increase quality of a classifier built based on the classical SVM in most of the applications, particularly for polynomial kernels.
first_indexed 2024-04-14T08:02:20Z
format Article
id doaj.art-a9f04f9cb801495fb4d75d7ba1c5f01f
institution Directory Open Access Journal
issn 1664-8021
language English
last_indexed 2024-04-14T08:02:20Z
publishDate 2019-01-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Genetics
spelling doaj.art-a9f04f9cb801495fb4d75d7ba1c5f01f2022-12-22T02:04:52ZengFrontiers Media S.A.Frontiers in Genetics1664-80212019-01-01910.3389/fgene.2018.00717422738FLOating-Window Projective Separator (FloWPS): A Data Trimming Tool for Support Vector Machines (SVM) to Improve Robustness of the ClassifierVictor Tkachev0Maxim Sorokin1Maxim Sorokin2Artem Mescheryakov3Alexander Simonov4Andrew Garazha5Anton Buzdin6Anton Buzdin7Anton Buzdin8Ilya Muchnik9Nicolas Borisov10Nicolas Borisov11Department of Bioinformatics and Molecular Networks, OmicsWay Corporation, Walnut, CA, United StatesDepartment of Bioinformatics and Molecular Networks, OmicsWay Corporation, Walnut, CA, United StatesShemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, RussiaYandex N.V. Corporation, Moscow, RussiaDepartment of Bioinformatics and Molecular Networks, OmicsWay Corporation, Walnut, CA, United StatesDepartment of Bioinformatics and Molecular Networks, OmicsWay Corporation, Walnut, CA, United StatesDepartment of Bioinformatics and Molecular Networks, OmicsWay Corporation, Walnut, CA, United StatesShemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, RussiaI.M. Sechenov First Moscow State Medical University (Sechenov University), Moscow, RussiaHill Center, Rutgers University, Piscataway, NJ, United StatesDepartment of Bioinformatics and Molecular Networks, OmicsWay Corporation, Walnut, CA, United StatesI.M. Sechenov First Moscow State Medical University (Sechenov University), Moscow, RussiaHere, we propose a heuristic technique of data trimming for SVM termed FLOating Window Projective Separator (FloWPS), tailored for personalized predictions based on molecular data. This procedure can operate with high throughput genetic datasets like gene expression or mutation profiles. Its application prevents SVM from extrapolation by excluding non-informative features. FloWPS requires training on the data for the individuals with known clinical outcomes to create a clinically relevant classifier. The genetic profiles linked with the outcomes are broken as usual into the training and validation datasets. The unique property of FloWPS is that irrelevant features in validation dataset that don’t have significant number of neighboring hits in the training dataset are removed from further analyses. Next, similarly to the k nearest neighbors (kNN) method, for each point of a validation dataset, FloWPS takes into account only the proximal points of the training dataset. Thus, for every point of a validation dataset, the training dataset is adjusted to form a floating window. FloWPS performance was tested on ten gene expression datasets for 992 cancer patients either responding or not on the different types of chemotherapy. We experimentally confirmed by leave-one-out cross-validation that FloWPS enables to significantly increase quality of a classifier built based on the classical SVM in most of the applications, particularly for polynomial kernels.https://www.frontiersin.org/article/10.3389/fgene.2018.00717/fullbioinformaticsmachine learningoncologygene expressionsupport vector machinespersonalized medicine
spellingShingle Victor Tkachev
Maxim Sorokin
Maxim Sorokin
Artem Mescheryakov
Alexander Simonov
Andrew Garazha
Anton Buzdin
Anton Buzdin
Anton Buzdin
Ilya Muchnik
Nicolas Borisov
Nicolas Borisov
FLOating-Window Projective Separator (FloWPS): A Data Trimming Tool for Support Vector Machines (SVM) to Improve Robustness of the Classifier
Frontiers in Genetics
bioinformatics
machine learning
oncology
gene expression
support vector machines
personalized medicine
title FLOating-Window Projective Separator (FloWPS): A Data Trimming Tool for Support Vector Machines (SVM) to Improve Robustness of the Classifier
title_full FLOating-Window Projective Separator (FloWPS): A Data Trimming Tool for Support Vector Machines (SVM) to Improve Robustness of the Classifier
title_fullStr FLOating-Window Projective Separator (FloWPS): A Data Trimming Tool for Support Vector Machines (SVM) to Improve Robustness of the Classifier
title_full_unstemmed FLOating-Window Projective Separator (FloWPS): A Data Trimming Tool for Support Vector Machines (SVM) to Improve Robustness of the Classifier
title_short FLOating-Window Projective Separator (FloWPS): A Data Trimming Tool for Support Vector Machines (SVM) to Improve Robustness of the Classifier
title_sort floating window projective separator flowps a data trimming tool for support vector machines svm to improve robustness of the classifier
topic bioinformatics
machine learning
oncology
gene expression
support vector machines
personalized medicine
url https://www.frontiersin.org/article/10.3389/fgene.2018.00717/full
work_keys_str_mv AT victortkachev floatingwindowprojectiveseparatorflowpsadatatrimmingtoolforsupportvectormachinessvmtoimproverobustnessoftheclassifier
AT maximsorokin floatingwindowprojectiveseparatorflowpsadatatrimmingtoolforsupportvectormachinessvmtoimproverobustnessoftheclassifier
AT maximsorokin floatingwindowprojectiveseparatorflowpsadatatrimmingtoolforsupportvectormachinessvmtoimproverobustnessoftheclassifier
AT artemmescheryakov floatingwindowprojectiveseparatorflowpsadatatrimmingtoolforsupportvectormachinessvmtoimproverobustnessoftheclassifier
AT alexandersimonov floatingwindowprojectiveseparatorflowpsadatatrimmingtoolforsupportvectormachinessvmtoimproverobustnessoftheclassifier
AT andrewgarazha floatingwindowprojectiveseparatorflowpsadatatrimmingtoolforsupportvectormachinessvmtoimproverobustnessoftheclassifier
AT antonbuzdin floatingwindowprojectiveseparatorflowpsadatatrimmingtoolforsupportvectormachinessvmtoimproverobustnessoftheclassifier
AT antonbuzdin floatingwindowprojectiveseparatorflowpsadatatrimmingtoolforsupportvectormachinessvmtoimproverobustnessoftheclassifier
AT antonbuzdin floatingwindowprojectiveseparatorflowpsadatatrimmingtoolforsupportvectormachinessvmtoimproverobustnessoftheclassifier
AT ilyamuchnik floatingwindowprojectiveseparatorflowpsadatatrimmingtoolforsupportvectormachinessvmtoimproverobustnessoftheclassifier
AT nicolasborisov floatingwindowprojectiveseparatorflowpsadatatrimmingtoolforsupportvectormachinessvmtoimproverobustnessoftheclassifier
AT nicolasborisov floatingwindowprojectiveseparatorflowpsadatatrimmingtoolforsupportvectormachinessvmtoimproverobustnessoftheclassifier