Posterior contraction rate of sparse latent feature models with application to proteomics
The Indian buffet process (IBP) and phylogenetic Indian buffet process (pIBP) can be used as prior models to infer latent features in a data set. The theoretical properties of these models are under-explored, however, especially in high dimensional settings. In this paper, we show that under mild sp...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Taylor & Francis Group
2022-01-01
|
Series: | Statistical Theory and Related Fields |
Subjects: | |
Online Access: | http://dx.doi.org/10.1080/24754269.2021.1974664 |
_version_ | 1827809220382162944 |
---|---|
author | Tong Li Tianjian Zhou Kam-Wah Tsui Lin Wei Yuan Ji |
author_facet | Tong Li Tianjian Zhou Kam-Wah Tsui Lin Wei Yuan Ji |
author_sort | Tong Li |
collection | DOAJ |
description | The Indian buffet process (IBP) and phylogenetic Indian buffet process (pIBP) can be used as prior models to infer latent features in a data set. The theoretical properties of these models are under-explored, however, especially in high dimensional settings. In this paper, we show that under mild sparsity condition, the posterior distribution of the latent feature matrix, generated via IBP or pIBP priors, converges to the true latent feature matrix asymptotically. We derive the posterior convergence rate, referred to as the contraction rate. We show that the convergence results remain valid even when the dimensionality of the latent feature matrix increases with the sample size, therefore making the posterior inference valid in high dimensional settings. We demonstrate the theoretical results using computer simulation, in which the parallel-tempering Markov chain Monte Carlo method is applied to overcome computational hurdles. The practical utility of the derived properties is demonstrated by inferring the latent features in a reverse phase protein arrays (RPPA) dataset under the IBP prior model. |
first_indexed | 2024-03-11T22:38:29Z |
format | Article |
id | doaj.art-0c95c9b856d3488292e996311217c0b3 |
institution | Directory Open Access Journal |
issn | 2475-4269 2475-4277 |
language | English |
last_indexed | 2024-03-11T22:38:29Z |
publishDate | 2022-01-01 |
publisher | Taylor & Francis Group |
record_format | Article |
series | Statistical Theory and Related Fields |
spelling | doaj.art-0c95c9b856d3488292e996311217c0b32023-09-22T09:19:46ZengTaylor & Francis GroupStatistical Theory and Related Fields2475-42692475-42772022-01-0161293910.1080/24754269.2021.19746641974664Posterior contraction rate of sparse latent feature models with application to proteomicsTong Li0Tianjian Zhou1Kam-Wah Tsui2Lin Wei3Yuan Ji4Columbia UniversityColorado State UniversityUniversity of Wisconsin–MadisonNorthShore University HealthSystemUniversity of ChicagoThe Indian buffet process (IBP) and phylogenetic Indian buffet process (pIBP) can be used as prior models to infer latent features in a data set. The theoretical properties of these models are under-explored, however, especially in high dimensional settings. In this paper, we show that under mild sparsity condition, the posterior distribution of the latent feature matrix, generated via IBP or pIBP priors, converges to the true latent feature matrix asymptotically. We derive the posterior convergence rate, referred to as the contraction rate. We show that the convergence results remain valid even when the dimensionality of the latent feature matrix increases with the sample size, therefore making the posterior inference valid in high dimensional settings. We demonstrate the theoretical results using computer simulation, in which the parallel-tempering Markov chain Monte Carlo method is applied to overcome computational hurdles. The practical utility of the derived properties is demonstrated by inferring the latent features in a reverse phase protein arrays (RPPA) dataset under the IBP prior model.http://dx.doi.org/10.1080/24754269.2021.1974664high dimensionindian buffet processlatent featuremarkov chain monte carloposterior convergencereverse phase protein arrays |
spellingShingle | Tong Li Tianjian Zhou Kam-Wah Tsui Lin Wei Yuan Ji Posterior contraction rate of sparse latent feature models with application to proteomics Statistical Theory and Related Fields high dimension indian buffet process latent feature markov chain monte carlo posterior convergence reverse phase protein arrays |
title | Posterior contraction rate of sparse latent feature models with application to proteomics |
title_full | Posterior contraction rate of sparse latent feature models with application to proteomics |
title_fullStr | Posterior contraction rate of sparse latent feature models with application to proteomics |
title_full_unstemmed | Posterior contraction rate of sparse latent feature models with application to proteomics |
title_short | Posterior contraction rate of sparse latent feature models with application to proteomics |
title_sort | posterior contraction rate of sparse latent feature models with application to proteomics |
topic | high dimension indian buffet process latent feature markov chain monte carlo posterior convergence reverse phase protein arrays |
url | http://dx.doi.org/10.1080/24754269.2021.1974664 |
work_keys_str_mv | AT tongli posteriorcontractionrateofsparselatentfeaturemodelswithapplicationtoproteomics AT tianjianzhou posteriorcontractionrateofsparselatentfeaturemodelswithapplicationtoproteomics AT kamwahtsui posteriorcontractionrateofsparselatentfeaturemodelswithapplicationtoproteomics AT linwei posteriorcontractionrateofsparselatentfeaturemodelswithapplicationtoproteomics AT yuanji posteriorcontractionrateofsparselatentfeaturemodelswithapplicationtoproteomics |