So you think you can PLS-DA?

Abstract Background Partial Least-Squares Discriminant Analysis (PLS-DA) is a popular machine learning tool that is gaining increasing attention as a useful feature selector and classifier. In an effort to understand its strengths and weaknesses, we performed a series of experiments with synthetic d...

Full description

Bibliographic Details
Main Authors:	Daniel Ruiz-Perez, Haibin Guan, Purnima Madhivanan, Kalai Mathee, Giri Narasimhan
Format:	Article
Language:	English
Published:	BMC 2020-12-01
Series:	BMC Bioinformatics
Subjects:	PLS-DA PCA Feature selection Dimensionality reduction Bioinformatics
Online Access:	https://doi.org/10.1186/s12859-019-3310-7

_version_	1818623551999049728
author	Daniel Ruiz-Perez Haibin Guan Purnima Madhivanan Kalai Mathee Giri Narasimhan
author_facet	Daniel Ruiz-Perez Haibin Guan Purnima Madhivanan Kalai Mathee Giri Narasimhan
author_sort	Daniel Ruiz-Perez
collection	DOAJ
description	Abstract Background Partial Least-Squares Discriminant Analysis (PLS-DA) is a popular machine learning tool that is gaining increasing attention as a useful feature selector and classifier. In an effort to understand its strengths and weaknesses, we performed a series of experiments with synthetic data and compared its performance to its close relative from which it was initially invented, namely Principal Component Analysis (PCA). Results We demonstrate that even though PCA ignores the information regarding the class labels of the samples, this unsupervised tool can be remarkably effective as a feature selector. In some cases, it outperforms PLS-DA, which is made aware of the class labels in its input. Our experiments range from looking at the signal-to-noise ratio in the feature selection task, to considering many practical distributions and models encountered when analyzing bioinformatics and clinical data. Other methods were also evaluated. Finally, we analyzed an interesting data set from 396 vaginal microbiome samples where the ground truth for the feature selection was available. All the 3D figures shown in this paper as well as the supplementary ones can be viewed interactively at http://biorg.cs.fiu.edu/plsda Conclusions Our results highlighted the strengths and weaknesses of PLS-DA in comparison with PCA for different underlying data models.
first_indexed	2024-12-16T18:42:52Z
format	Article
id	doaj.art-6a907005b0d448f7adfb27c5cfbdf021
institution	Directory Open Access Journal
issn	1471-2105
language	English
last_indexed	2024-12-16T18:42:52Z
publishDate	2020-12-01
publisher	BMC
record_format	Article
series	BMC Bioinformatics
spelling	doaj.art-6a907005b0d448f7adfb27c5cfbdf0212022-12-21T22:20:58ZengBMCBMC Bioinformatics1471-21052020-12-0121S111010.1186/s12859-019-3310-7So you think you can PLS-DA?Daniel Ruiz-Perez0Haibin Guan1Purnima Madhivanan2Kalai Mathee3Giri Narasimhan4Bioinformatics Research Group (BioRG), Florida International UniversityBioinformatics Research Group (BioRG), Florida International UniversityDepartment of Epidemiology, Florida International UniversityHerbert Wertheim College of Medicine, Florida International UniversityBioinformatics Research Group (BioRG), Florida International UniversityAbstract Background Partial Least-Squares Discriminant Analysis (PLS-DA) is a popular machine learning tool that is gaining increasing attention as a useful feature selector and classifier. In an effort to understand its strengths and weaknesses, we performed a series of experiments with synthetic data and compared its performance to its close relative from which it was initially invented, namely Principal Component Analysis (PCA). Results We demonstrate that even though PCA ignores the information regarding the class labels of the samples, this unsupervised tool can be remarkably effective as a feature selector. In some cases, it outperforms PLS-DA, which is made aware of the class labels in its input. Our experiments range from looking at the signal-to-noise ratio in the feature selection task, to considering many practical distributions and models encountered when analyzing bioinformatics and clinical data. Other methods were also evaluated. Finally, we analyzed an interesting data set from 396 vaginal microbiome samples where the ground truth for the feature selection was available. All the 3D figures shown in this paper as well as the supplementary ones can be viewed interactively at http://biorg.cs.fiu.edu/plsda Conclusions Our results highlighted the strengths and weaknesses of PLS-DA in comparison with PCA for different underlying data models.https://doi.org/10.1186/s12859-019-3310-7PLS-DAPCAFeature selectionDimensionality reductionBioinformatics
spellingShingle	Daniel Ruiz-Perez Haibin Guan Purnima Madhivanan Kalai Mathee Giri Narasimhan So you think you can PLS-DA? BMC Bioinformatics PLS-DA PCA Feature selection Dimensionality reduction Bioinformatics
title	So you think you can PLS-DA?
title_full	So you think you can PLS-DA?
title_fullStr	So you think you can PLS-DA?
title_full_unstemmed	So you think you can PLS-DA?
title_short	So you think you can PLS-DA?
title_sort	so you think you can pls da
topic	PLS-DA PCA Feature selection Dimensionality reduction Bioinformatics
url	https://doi.org/10.1186/s12859-019-3310-7
work_keys_str_mv	AT danielruizperez soyouthinkyoucanplsda AT haibinguan soyouthinkyoucanplsda AT purnimamadhivanan soyouthinkyoucanplsda AT kalaimathee soyouthinkyoucanplsda AT girinarasimhan soyouthinkyoucanplsda

So you think you can PLS-DA?

Similar Items