Posterior contraction rate of sparse latent feature models with application to proteomics

The Indian buffet process (IBP) and phylogenetic Indian buffet process (pIBP) can be used as prior models to infer latent features in a data set. The theoretical properties of these models are under-explored, however, especially in high dimensional settings. In this paper, we show that under mild sp...

Full description

Bibliographic Details
Main Authors: Tong Li, Tianjian Zhou, Kam-Wah Tsui, Lin Wei, Yuan Ji
Format: Article
Language:English
Published: Taylor & Francis Group 2022-01-01
Series:Statistical Theory and Related Fields
Subjects:
Online Access:http://dx.doi.org/10.1080/24754269.2021.1974664
_version_ 1827809220382162944
author Tong Li
Tianjian Zhou
Kam-Wah Tsui
Lin Wei
Yuan Ji
author_facet Tong Li
Tianjian Zhou
Kam-Wah Tsui
Lin Wei
Yuan Ji
author_sort Tong Li
collection DOAJ
description The Indian buffet process (IBP) and phylogenetic Indian buffet process (pIBP) can be used as prior models to infer latent features in a data set. The theoretical properties of these models are under-explored, however, especially in high dimensional settings. In this paper, we show that under mild sparsity condition, the posterior distribution of the latent feature matrix, generated via IBP or pIBP priors, converges to the true latent feature matrix asymptotically. We derive the posterior convergence rate, referred to as the contraction rate. We show that the convergence results remain valid even when the dimensionality of the latent feature matrix increases with the sample size, therefore making the posterior inference valid in high dimensional settings. We demonstrate the theoretical results using computer simulation, in which the parallel-tempering Markov chain Monte Carlo method is applied to overcome computational hurdles. The practical utility of the derived properties is demonstrated by inferring the latent features in a reverse phase protein arrays (RPPA) dataset under the IBP prior model.
first_indexed 2024-03-11T22:38:29Z
format Article
id doaj.art-0c95c9b856d3488292e996311217c0b3
institution Directory Open Access Journal
issn 2475-4269
2475-4277
language English
last_indexed 2024-03-11T22:38:29Z
publishDate 2022-01-01
publisher Taylor & Francis Group
record_format Article
series Statistical Theory and Related Fields
spelling doaj.art-0c95c9b856d3488292e996311217c0b32023-09-22T09:19:46ZengTaylor & Francis GroupStatistical Theory and Related Fields2475-42692475-42772022-01-0161293910.1080/24754269.2021.19746641974664Posterior contraction rate of sparse latent feature models with application to proteomicsTong Li0Tianjian Zhou1Kam-Wah Tsui2Lin Wei3Yuan Ji4Columbia UniversityColorado State UniversityUniversity of Wisconsin–MadisonNorthShore University HealthSystemUniversity of ChicagoThe Indian buffet process (IBP) and phylogenetic Indian buffet process (pIBP) can be used as prior models to infer latent features in a data set. The theoretical properties of these models are under-explored, however, especially in high dimensional settings. In this paper, we show that under mild sparsity condition, the posterior distribution of the latent feature matrix, generated via IBP or pIBP priors, converges to the true latent feature matrix asymptotically. We derive the posterior convergence rate, referred to as the contraction rate. We show that the convergence results remain valid even when the dimensionality of the latent feature matrix increases with the sample size, therefore making the posterior inference valid in high dimensional settings. We demonstrate the theoretical results using computer simulation, in which the parallel-tempering Markov chain Monte Carlo method is applied to overcome computational hurdles. The practical utility of the derived properties is demonstrated by inferring the latent features in a reverse phase protein arrays (RPPA) dataset under the IBP prior model.http://dx.doi.org/10.1080/24754269.2021.1974664high dimensionindian buffet processlatent featuremarkov chain monte carloposterior convergencereverse phase protein arrays
spellingShingle Tong Li
Tianjian Zhou
Kam-Wah Tsui
Lin Wei
Yuan Ji
Posterior contraction rate of sparse latent feature models with application to proteomics
Statistical Theory and Related Fields
high dimension
indian buffet process
latent feature
markov chain monte carlo
posterior convergence
reverse phase protein arrays
title Posterior contraction rate of sparse latent feature models with application to proteomics
title_full Posterior contraction rate of sparse latent feature models with application to proteomics
title_fullStr Posterior contraction rate of sparse latent feature models with application to proteomics
title_full_unstemmed Posterior contraction rate of sparse latent feature models with application to proteomics
title_short Posterior contraction rate of sparse latent feature models with application to proteomics
title_sort posterior contraction rate of sparse latent feature models with application to proteomics
topic high dimension
indian buffet process
latent feature
markov chain monte carlo
posterior convergence
reverse phase protein arrays
url http://dx.doi.org/10.1080/24754269.2021.1974664
work_keys_str_mv AT tongli posteriorcontractionrateofsparselatentfeaturemodelswithapplicationtoproteomics
AT tianjianzhou posteriorcontractionrateofsparselatentfeaturemodelswithapplicationtoproteomics
AT kamwahtsui posteriorcontractionrateofsparselatentfeaturemodelswithapplicationtoproteomics
AT linwei posteriorcontractionrateofsparselatentfeaturemodelswithapplicationtoproteomics
AT yuanji posteriorcontractionrateofsparselatentfeaturemodelswithapplicationtoproteomics