SuSiE PCA: A scalable Bayesian variable selection technique for principal component analysis
Summary: Latent factor models, like principal component analysis (PCA), provide a statistical framework to infer low-rank representation in various biological contexts. However, feature selection is challenging when this low-rank structure manifests from a sparse subspace. We introduce SuSiE PCA, a...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2023-11-01
|
Series: | iScience |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2589004223022587 |
_version_ | 1797643614292017152 |
---|---|
author | Dong Yuan Nicholas Mancuso |
author_facet | Dong Yuan Nicholas Mancuso |
author_sort | Dong Yuan |
collection | DOAJ |
description | Summary: Latent factor models, like principal component analysis (PCA), provide a statistical framework to infer low-rank representation in various biological contexts. However, feature selection is challenging when this low-rank structure manifests from a sparse subspace. We introduce SuSiE PCA, a scalable sparse latent factor approach that evaluates uncertainty in contributing variables through posterior inclusion probabilities. We validate our model in extensive simulations and demonstrate that SuSiE PCA outperforms other approaches in signal detection and model robustness. We apply SuSiE PCA to multi-tissue expression quantitative trait loci (eQTLs) data from GTEx v8 and identify tissue-specific factors and their contributing eGenes. We further investigate its performance on the large-scale perturbation data and find that SuSiE PCA identifies modules with a higher enrichment of ribosome-related genes than sparse PCA (false discovery rate [FDR] =9.2×10−82 vs. 1.4×10−33), while being ∼ 18x faster. Overall, SuSiE PCA provides an efficient tool to identify relevant features in high-dimensional biological data. |
first_indexed | 2024-03-11T14:17:27Z |
format | Article |
id | doaj.art-9b3352ed968e423286f7c9442926cd5c |
institution | Directory Open Access Journal |
issn | 2589-0042 |
language | English |
last_indexed | 2024-03-11T14:17:27Z |
publishDate | 2023-11-01 |
publisher | Elsevier |
record_format | Article |
series | iScience |
spelling | doaj.art-9b3352ed968e423286f7c9442926cd5c2023-11-01T04:47:52ZengElsevieriScience2589-00422023-11-012611108181SuSiE PCA: A scalable Bayesian variable selection technique for principal component analysisDong Yuan0Nicholas Mancuso1Biostatistics Division, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA 90089, USA; Corresponding authorBiostatistics Division, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA 90089, USA; Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA 90089, USA; Corresponding authorSummary: Latent factor models, like principal component analysis (PCA), provide a statistical framework to infer low-rank representation in various biological contexts. However, feature selection is challenging when this low-rank structure manifests from a sparse subspace. We introduce SuSiE PCA, a scalable sparse latent factor approach that evaluates uncertainty in contributing variables through posterior inclusion probabilities. We validate our model in extensive simulations and demonstrate that SuSiE PCA outperforms other approaches in signal detection and model robustness. We apply SuSiE PCA to multi-tissue expression quantitative trait loci (eQTLs) data from GTEx v8 and identify tissue-specific factors and their contributing eGenes. We further investigate its performance on the large-scale perturbation data and find that SuSiE PCA identifies modules with a higher enrichment of ribosome-related genes than sparse PCA (false discovery rate [FDR] =9.2×10−82 vs. 1.4×10−33), while being ∼ 18x faster. Overall, SuSiE PCA provides an efficient tool to identify relevant features in high-dimensional biological data.http://www.sciencedirect.com/science/article/pii/S2589004223022587Biocomputational methodClassification of bioinformatical subjectdata processing in systems biologyAlgorithms |
spellingShingle | Dong Yuan Nicholas Mancuso SuSiE PCA: A scalable Bayesian variable selection technique for principal component analysis iScience Biocomputational method Classification of bioinformatical subject data processing in systems biology Algorithms |
title | SuSiE PCA: A scalable Bayesian variable selection technique for principal component analysis |
title_full | SuSiE PCA: A scalable Bayesian variable selection technique for principal component analysis |
title_fullStr | SuSiE PCA: A scalable Bayesian variable selection technique for principal component analysis |
title_full_unstemmed | SuSiE PCA: A scalable Bayesian variable selection technique for principal component analysis |
title_short | SuSiE PCA: A scalable Bayesian variable selection technique for principal component analysis |
title_sort | susie pca a scalable bayesian variable selection technique for principal component analysis |
topic | Biocomputational method Classification of bioinformatical subject data processing in systems biology Algorithms |
url | http://www.sciencedirect.com/science/article/pii/S2589004223022587 |
work_keys_str_mv | AT dongyuan susiepcaascalablebayesianvariableselectiontechniqueforprincipalcomponentanalysis AT nicholasmancuso susiepcaascalablebayesianvariableselectiontechniqueforprincipalcomponentanalysis |