tascCODA: Bayesian Tree-Aggregated Analysis of Compositional Amplicon and Single-Cell Data

Accurate generative statistical modeling of count data is of critical relevance for the analysis of biological datasets from high-throughput sequencing technologies. Important instances include the modeling of microbiome compositions from amplicon sequencing surveys and the analysis of cell type com...

Full description

Bibliographic Details
Main Authors: Johannes Ostner, Salomé Carcy, Christian L. Müller
Format: Article
Language:English
Published: Frontiers Media S.A. 2021-12-01
Series:Frontiers in Genetics
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fgene.2021.766405/full
_version_ 1818425383710621696
author Johannes Ostner
Johannes Ostner
Salomé Carcy
Salomé Carcy
Christian L. Müller
Christian L. Müller
Christian L. Müller
author_facet Johannes Ostner
Johannes Ostner
Salomé Carcy
Salomé Carcy
Christian L. Müller
Christian L. Müller
Christian L. Müller
author_sort Johannes Ostner
collection DOAJ
description Accurate generative statistical modeling of count data is of critical relevance for the analysis of biological datasets from high-throughput sequencing technologies. Important instances include the modeling of microbiome compositions from amplicon sequencing surveys and the analysis of cell type compositions derived from single-cell RNA sequencing. Microbial and cell type abundance data share remarkably similar statistical features, including their inherent compositionality and a natural hierarchical ordering of the individual components from taxonomic or cell lineage tree information, respectively. To this end, we introduce a Bayesian model for tree-aggregated amplicon and single-cell compositional data analysis (tascCODA) that seamlessly integrates hierarchical information and experimental covariate data into the generative modeling of compositional count data. By combining latent parameters based on the tree structure with spike-and-slab Lasso penalization, tascCODA can determine covariate effects across different levels of the population hierarchy in a data-driven parsimonious way. In the context of differential abundance testing, we validate tascCODA’s excellent performance on a comprehensive set of synthetic benchmark scenarios. Our analyses on human single-cell RNA-seq data from ulcerative colitis patients and amplicon data from patients with irritable bowel syndrome, respectively, identified aggregated cell type and taxon compositional changes that were more predictive and parsimonious than those proposed by other schemes. We posit that tascCODA1 constitutes a valuable addition to the growing statistical toolbox for generative modeling and analysis of compositional changes in microbial or cell population data.
first_indexed 2024-12-14T14:13:04Z
format Article
id doaj.art-17a9575d6dd943369f034d710e8190a9
institution Directory Open Access Journal
issn 1664-8021
language English
last_indexed 2024-12-14T14:13:04Z
publishDate 2021-12-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Genetics
spelling doaj.art-17a9575d6dd943369f034d710e8190a92022-12-21T22:58:15ZengFrontiers Media S.A.Frontiers in Genetics1664-80212021-12-011210.3389/fgene.2021.766405766405tascCODA: Bayesian Tree-Aggregated Analysis of Compositional Amplicon and Single-Cell DataJohannes Ostner0Johannes Ostner1Salomé Carcy2Salomé Carcy3Christian L. Müller4Christian L. Müller5Christian L. Müller6Department of Statistics, Ludwig-Maximilians-Universität München, Munich, GermanyInstitute of Computational Biology, Helmholtz Zentrum München, Munich, GermanyInstitute of Computational Biology, Helmholtz Zentrum München, Munich, GermanyDepartment of Biology, École Normale Supérieure, PSL University, Paris, FranceDepartment of Statistics, Ludwig-Maximilians-Universität München, Munich, GermanyInstitute of Computational Biology, Helmholtz Zentrum München, Munich, GermanyCenter for Computational Mathematics, Flatiron Institute, New York, NY, United StatesAccurate generative statistical modeling of count data is of critical relevance for the analysis of biological datasets from high-throughput sequencing technologies. Important instances include the modeling of microbiome compositions from amplicon sequencing surveys and the analysis of cell type compositions derived from single-cell RNA sequencing. Microbial and cell type abundance data share remarkably similar statistical features, including their inherent compositionality and a natural hierarchical ordering of the individual components from taxonomic or cell lineage tree information, respectively. To this end, we introduce a Bayesian model for tree-aggregated amplicon and single-cell compositional data analysis (tascCODA) that seamlessly integrates hierarchical information and experimental covariate data into the generative modeling of compositional count data. By combining latent parameters based on the tree structure with spike-and-slab Lasso penalization, tascCODA can determine covariate effects across different levels of the population hierarchy in a data-driven parsimonious way. In the context of differential abundance testing, we validate tascCODA’s excellent performance on a comprehensive set of synthetic benchmark scenarios. Our analyses on human single-cell RNA-seq data from ulcerative colitis patients and amplicon data from patients with irritable bowel syndrome, respectively, identified aggregated cell type and taxon compositional changes that were more predictive and parsimonious than those proposed by other schemes. We posit that tascCODA1 constitutes a valuable addition to the growing statistical toolbox for generative modeling and analysis of compositional changes in microbial or cell population data.https://www.frontiersin.org/articles/10.3389/fgene.2021.766405/fullbayesian modelingdirichlet multinomialmicrobiome datasingle-cell dataspike-and-slab lassotree aggregation
spellingShingle Johannes Ostner
Johannes Ostner
Salomé Carcy
Salomé Carcy
Christian L. Müller
Christian L. Müller
Christian L. Müller
tascCODA: Bayesian Tree-Aggregated Analysis of Compositional Amplicon and Single-Cell Data
Frontiers in Genetics
bayesian modeling
dirichlet multinomial
microbiome data
single-cell data
spike-and-slab lasso
tree aggregation
title tascCODA: Bayesian Tree-Aggregated Analysis of Compositional Amplicon and Single-Cell Data
title_full tascCODA: Bayesian Tree-Aggregated Analysis of Compositional Amplicon and Single-Cell Data
title_fullStr tascCODA: Bayesian Tree-Aggregated Analysis of Compositional Amplicon and Single-Cell Data
title_full_unstemmed tascCODA: Bayesian Tree-Aggregated Analysis of Compositional Amplicon and Single-Cell Data
title_short tascCODA: Bayesian Tree-Aggregated Analysis of Compositional Amplicon and Single-Cell Data
title_sort tasccoda bayesian tree aggregated analysis of compositional amplicon and single cell data
topic bayesian modeling
dirichlet multinomial
microbiome data
single-cell data
spike-and-slab lasso
tree aggregation
url https://www.frontiersin.org/articles/10.3389/fgene.2021.766405/full
work_keys_str_mv AT johannesostner tasccodabayesiantreeaggregatedanalysisofcompositionalampliconandsinglecelldata
AT johannesostner tasccodabayesiantreeaggregatedanalysisofcompositionalampliconandsinglecelldata
AT salomecarcy tasccodabayesiantreeaggregatedanalysisofcompositionalampliconandsinglecelldata
AT salomecarcy tasccodabayesiantreeaggregatedanalysisofcompositionalampliconandsinglecelldata
AT christianlmuller tasccodabayesiantreeaggregatedanalysisofcompositionalampliconandsinglecelldata
AT christianlmuller tasccodabayesiantreeaggregatedanalysisofcompositionalampliconandsinglecelldata
AT christianlmuller tasccodabayesiantreeaggregatedanalysisofcompositionalampliconandsinglecelldata