SuSiE PCA: A scalable Bayesian variable selection technique for principal component analysis

Summary: Latent factor models, like principal component analysis (PCA), provide a statistical framework to infer low-rank representation in various biological contexts. However, feature selection is challenging when this low-rank structure manifests from a sparse subspace. We introduce SuSiE PCA, a...

Full description

Bibliographic Details
Main Authors: Dong Yuan, Nicholas Mancuso
Format: Article
Language:English
Published: Elsevier 2023-11-01
Series:iScience
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2589004223022587
_version_ 1797643614292017152
author Dong Yuan
Nicholas Mancuso
author_facet Dong Yuan
Nicholas Mancuso
author_sort Dong Yuan
collection DOAJ
description Summary: Latent factor models, like principal component analysis (PCA), provide a statistical framework to infer low-rank representation in various biological contexts. However, feature selection is challenging when this low-rank structure manifests from a sparse subspace. We introduce SuSiE PCA, a scalable sparse latent factor approach that evaluates uncertainty in contributing variables through posterior inclusion probabilities. We validate our model in extensive simulations and demonstrate that SuSiE PCA outperforms other approaches in signal detection and model robustness. We apply SuSiE PCA to multi-tissue expression quantitative trait loci (eQTLs) data from GTEx v8 and identify tissue-specific factors and their contributing eGenes. We further investigate its performance on the large-scale perturbation data and find that SuSiE PCA identifies modules with a higher enrichment of ribosome-related genes than sparse PCA (false discovery rate [FDR] =9.2×10−82 vs. 1.4×10−33), while being ∼ 18x faster. Overall, SuSiE PCA provides an efficient tool to identify relevant features in high-dimensional biological data.
first_indexed 2024-03-11T14:17:27Z
format Article
id doaj.art-9b3352ed968e423286f7c9442926cd5c
institution Directory Open Access Journal
issn 2589-0042
language English
last_indexed 2024-03-11T14:17:27Z
publishDate 2023-11-01
publisher Elsevier
record_format Article
series iScience
spelling doaj.art-9b3352ed968e423286f7c9442926cd5c2023-11-01T04:47:52ZengElsevieriScience2589-00422023-11-012611108181SuSiE PCA: A scalable Bayesian variable selection technique for principal component analysisDong Yuan0Nicholas Mancuso1Biostatistics Division, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA 90089, USA; Corresponding authorBiostatistics Division, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA 90089, USA; Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA 90089, USA; Corresponding authorSummary: Latent factor models, like principal component analysis (PCA), provide a statistical framework to infer low-rank representation in various biological contexts. However, feature selection is challenging when this low-rank structure manifests from a sparse subspace. We introduce SuSiE PCA, a scalable sparse latent factor approach that evaluates uncertainty in contributing variables through posterior inclusion probabilities. We validate our model in extensive simulations and demonstrate that SuSiE PCA outperforms other approaches in signal detection and model robustness. We apply SuSiE PCA to multi-tissue expression quantitative trait loci (eQTLs) data from GTEx v8 and identify tissue-specific factors and their contributing eGenes. We further investigate its performance on the large-scale perturbation data and find that SuSiE PCA identifies modules with a higher enrichment of ribosome-related genes than sparse PCA (false discovery rate [FDR] =9.2×10−82 vs. 1.4×10−33), while being ∼ 18x faster. Overall, SuSiE PCA provides an efficient tool to identify relevant features in high-dimensional biological data.http://www.sciencedirect.com/science/article/pii/S2589004223022587Biocomputational methodClassification of bioinformatical subjectdata processing in systems biologyAlgorithms
spellingShingle Dong Yuan
Nicholas Mancuso
SuSiE PCA: A scalable Bayesian variable selection technique for principal component analysis
iScience
Biocomputational method
Classification of bioinformatical subject
data processing in systems biology
Algorithms
title SuSiE PCA: A scalable Bayesian variable selection technique for principal component analysis
title_full SuSiE PCA: A scalable Bayesian variable selection technique for principal component analysis
title_fullStr SuSiE PCA: A scalable Bayesian variable selection technique for principal component analysis
title_full_unstemmed SuSiE PCA: A scalable Bayesian variable selection technique for principal component analysis
title_short SuSiE PCA: A scalable Bayesian variable selection technique for principal component analysis
title_sort susie pca a scalable bayesian variable selection technique for principal component analysis
topic Biocomputational method
Classification of bioinformatical subject
data processing in systems biology
Algorithms
url http://www.sciencedirect.com/science/article/pii/S2589004223022587
work_keys_str_mv AT dongyuan susiepcaascalablebayesianvariableselectiontechniqueforprincipalcomponentanalysis
AT nicholasmancuso susiepcaascalablebayesianvariableselectiontechniqueforprincipalcomponentanalysis