tascCODA: Bayesian Tree-Aggregated Analysis of Compositional Amplicon and Single-Cell Data
Accurate generative statistical modeling of count data is of critical relevance for the analysis of biological datasets from high-throughput sequencing technologies. Important instances include the modeling of microbiome compositions from amplicon sequencing surveys and the analysis of cell type com...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2021-12-01
|
Series: | Frontiers in Genetics |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fgene.2021.766405/full |
_version_ | 1818425383710621696 |
---|---|
author | Johannes Ostner Johannes Ostner Salomé Carcy Salomé Carcy Christian L. Müller Christian L. Müller Christian L. Müller |
author_facet | Johannes Ostner Johannes Ostner Salomé Carcy Salomé Carcy Christian L. Müller Christian L. Müller Christian L. Müller |
author_sort | Johannes Ostner |
collection | DOAJ |
description | Accurate generative statistical modeling of count data is of critical relevance for the analysis of biological datasets from high-throughput sequencing technologies. Important instances include the modeling of microbiome compositions from amplicon sequencing surveys and the analysis of cell type compositions derived from single-cell RNA sequencing. Microbial and cell type abundance data share remarkably similar statistical features, including their inherent compositionality and a natural hierarchical ordering of the individual components from taxonomic or cell lineage tree information, respectively. To this end, we introduce a Bayesian model for tree-aggregated amplicon and single-cell compositional data analysis (tascCODA) that seamlessly integrates hierarchical information and experimental covariate data into the generative modeling of compositional count data. By combining latent parameters based on the tree structure with spike-and-slab Lasso penalization, tascCODA can determine covariate effects across different levels of the population hierarchy in a data-driven parsimonious way. In the context of differential abundance testing, we validate tascCODA’s excellent performance on a comprehensive set of synthetic benchmark scenarios. Our analyses on human single-cell RNA-seq data from ulcerative colitis patients and amplicon data from patients with irritable bowel syndrome, respectively, identified aggregated cell type and taxon compositional changes that were more predictive and parsimonious than those proposed by other schemes. We posit that tascCODA1 constitutes a valuable addition to the growing statistical toolbox for generative modeling and analysis of compositional changes in microbial or cell population data. |
first_indexed | 2024-12-14T14:13:04Z |
format | Article |
id | doaj.art-17a9575d6dd943369f034d710e8190a9 |
institution | Directory Open Access Journal |
issn | 1664-8021 |
language | English |
last_indexed | 2024-12-14T14:13:04Z |
publishDate | 2021-12-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Genetics |
spelling | doaj.art-17a9575d6dd943369f034d710e8190a92022-12-21T22:58:15ZengFrontiers Media S.A.Frontiers in Genetics1664-80212021-12-011210.3389/fgene.2021.766405766405tascCODA: Bayesian Tree-Aggregated Analysis of Compositional Amplicon and Single-Cell DataJohannes Ostner0Johannes Ostner1Salomé Carcy2Salomé Carcy3Christian L. Müller4Christian L. Müller5Christian L. Müller6Department of Statistics, Ludwig-Maximilians-Universität München, Munich, GermanyInstitute of Computational Biology, Helmholtz Zentrum München, Munich, GermanyInstitute of Computational Biology, Helmholtz Zentrum München, Munich, GermanyDepartment of Biology, École Normale Supérieure, PSL University, Paris, FranceDepartment of Statistics, Ludwig-Maximilians-Universität München, Munich, GermanyInstitute of Computational Biology, Helmholtz Zentrum München, Munich, GermanyCenter for Computational Mathematics, Flatiron Institute, New York, NY, United StatesAccurate generative statistical modeling of count data is of critical relevance for the analysis of biological datasets from high-throughput sequencing technologies. Important instances include the modeling of microbiome compositions from amplicon sequencing surveys and the analysis of cell type compositions derived from single-cell RNA sequencing. Microbial and cell type abundance data share remarkably similar statistical features, including their inherent compositionality and a natural hierarchical ordering of the individual components from taxonomic or cell lineage tree information, respectively. To this end, we introduce a Bayesian model for tree-aggregated amplicon and single-cell compositional data analysis (tascCODA) that seamlessly integrates hierarchical information and experimental covariate data into the generative modeling of compositional count data. By combining latent parameters based on the tree structure with spike-and-slab Lasso penalization, tascCODA can determine covariate effects across different levels of the population hierarchy in a data-driven parsimonious way. In the context of differential abundance testing, we validate tascCODA’s excellent performance on a comprehensive set of synthetic benchmark scenarios. Our analyses on human single-cell RNA-seq data from ulcerative colitis patients and amplicon data from patients with irritable bowel syndrome, respectively, identified aggregated cell type and taxon compositional changes that were more predictive and parsimonious than those proposed by other schemes. We posit that tascCODA1 constitutes a valuable addition to the growing statistical toolbox for generative modeling and analysis of compositional changes in microbial or cell population data.https://www.frontiersin.org/articles/10.3389/fgene.2021.766405/fullbayesian modelingdirichlet multinomialmicrobiome datasingle-cell dataspike-and-slab lassotree aggregation |
spellingShingle | Johannes Ostner Johannes Ostner Salomé Carcy Salomé Carcy Christian L. Müller Christian L. Müller Christian L. Müller tascCODA: Bayesian Tree-Aggregated Analysis of Compositional Amplicon and Single-Cell Data Frontiers in Genetics bayesian modeling dirichlet multinomial microbiome data single-cell data spike-and-slab lasso tree aggregation |
title | tascCODA: Bayesian Tree-Aggregated Analysis of Compositional Amplicon and Single-Cell Data |
title_full | tascCODA: Bayesian Tree-Aggregated Analysis of Compositional Amplicon and Single-Cell Data |
title_fullStr | tascCODA: Bayesian Tree-Aggregated Analysis of Compositional Amplicon and Single-Cell Data |
title_full_unstemmed | tascCODA: Bayesian Tree-Aggregated Analysis of Compositional Amplicon and Single-Cell Data |
title_short | tascCODA: Bayesian Tree-Aggregated Analysis of Compositional Amplicon and Single-Cell Data |
title_sort | tasccoda bayesian tree aggregated analysis of compositional amplicon and single cell data |
topic | bayesian modeling dirichlet multinomial microbiome data single-cell data spike-and-slab lasso tree aggregation |
url | https://www.frontiersin.org/articles/10.3389/fgene.2021.766405/full |
work_keys_str_mv | AT johannesostner tasccodabayesiantreeaggregatedanalysisofcompositionalampliconandsinglecelldata AT johannesostner tasccodabayesiantreeaggregatedanalysisofcompositionalampliconandsinglecelldata AT salomecarcy tasccodabayesiantreeaggregatedanalysisofcompositionalampliconandsinglecelldata AT salomecarcy tasccodabayesiantreeaggregatedanalysisofcompositionalampliconandsinglecelldata AT christianlmuller tasccodabayesiantreeaggregatedanalysisofcompositionalampliconandsinglecelldata AT christianlmuller tasccodabayesiantreeaggregatedanalysisofcompositionalampliconandsinglecelldata AT christianlmuller tasccodabayesiantreeaggregatedanalysisofcompositionalampliconandsinglecelldata |