Dirichlet multinomial mixtures: generative models for microbial metagenomics.

We introduce Dirichlet multinomial mixtures (DMM) for the probabilistic modelling of microbial metagenomics data. This data can be represented as a frequency matrix giving the number of times each taxa is observed in each sample. The samples have different size, and the matrix is sparse, as communit...

Full description

Bibliographic Details
Main Authors: Ian Holmes, Keith Harris, Christopher Quince
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2012-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC3272020?pdf=render
_version_ 1811196938921443328
author Ian Holmes
Keith Harris
Christopher Quince
author_facet Ian Holmes
Keith Harris
Christopher Quince
author_sort Ian Holmes
collection DOAJ
description We introduce Dirichlet multinomial mixtures (DMM) for the probabilistic modelling of microbial metagenomics data. This data can be represented as a frequency matrix giving the number of times each taxa is observed in each sample. The samples have different size, and the matrix is sparse, as communities are diverse and skewed to rare taxa. Most methods used previously to classify or cluster samples have ignored these features. We describe each community by a vector of taxa probabilities. These vectors are generated from one of a finite number of Dirichlet mixture components each with different hyperparameters. Observed samples are generated through multinomial sampling. The mixture components cluster communities into distinct 'metacommunities', and, hence, determine envirotypes or enterotypes, groups of communities with a similar composition. The model can also deduce the impact of a treatment and be used for classification. We wrote software for the fitting of DMM models using the 'evidence framework' (http://code.google.com/p/microbedmm/). This includes the Laplace approximation of the model evidence. We applied the DMM model to human gut microbe genera frequencies from Obese and Lean twins. From the model evidence four clusters fit this data best. Two clusters were dominated by Bacteroides and were homogenous; two had a more variable community composition. We could not find a significant impact of body mass on community structure. However, Obese twins were more likely to derive from the high variance clusters. We propose that obesity is not associated with a distinct microbiota but increases the chance that an individual derives from a disturbed enterotype. This is an example of the 'Anna Karenina principle (AKP)' applied to microbial communities: disturbed states having many more configurations than undisturbed. We verify this by showing that in a study of inflammatory bowel disease (IBD) phenotypes, ileal Crohn's disease (ICD) is associated with a more variable community.
first_indexed 2024-04-12T01:08:15Z
format Article
id doaj.art-dd09f62bb0254f93ad4fc32265c87dd8
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-04-12T01:08:15Z
publishDate 2012-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-dd09f62bb0254f93ad4fc32265c87dd82022-12-22T03:54:12ZengPublic Library of Science (PLoS)PLoS ONE1932-62032012-01-0172e3012610.1371/journal.pone.0030126Dirichlet multinomial mixtures: generative models for microbial metagenomics.Ian HolmesKeith HarrisChristopher QuinceWe introduce Dirichlet multinomial mixtures (DMM) for the probabilistic modelling of microbial metagenomics data. This data can be represented as a frequency matrix giving the number of times each taxa is observed in each sample. The samples have different size, and the matrix is sparse, as communities are diverse and skewed to rare taxa. Most methods used previously to classify or cluster samples have ignored these features. We describe each community by a vector of taxa probabilities. These vectors are generated from one of a finite number of Dirichlet mixture components each with different hyperparameters. Observed samples are generated through multinomial sampling. The mixture components cluster communities into distinct 'metacommunities', and, hence, determine envirotypes or enterotypes, groups of communities with a similar composition. The model can also deduce the impact of a treatment and be used for classification. We wrote software for the fitting of DMM models using the 'evidence framework' (http://code.google.com/p/microbedmm/). This includes the Laplace approximation of the model evidence. We applied the DMM model to human gut microbe genera frequencies from Obese and Lean twins. From the model evidence four clusters fit this data best. Two clusters were dominated by Bacteroides and were homogenous; two had a more variable community composition. We could not find a significant impact of body mass on community structure. However, Obese twins were more likely to derive from the high variance clusters. We propose that obesity is not associated with a distinct microbiota but increases the chance that an individual derives from a disturbed enterotype. This is an example of the 'Anna Karenina principle (AKP)' applied to microbial communities: disturbed states having many more configurations than undisturbed. We verify this by showing that in a study of inflammatory bowel disease (IBD) phenotypes, ileal Crohn's disease (ICD) is associated with a more variable community.http://europepmc.org/articles/PMC3272020?pdf=render
spellingShingle Ian Holmes
Keith Harris
Christopher Quince
Dirichlet multinomial mixtures: generative models for microbial metagenomics.
PLoS ONE
title Dirichlet multinomial mixtures: generative models for microbial metagenomics.
title_full Dirichlet multinomial mixtures: generative models for microbial metagenomics.
title_fullStr Dirichlet multinomial mixtures: generative models for microbial metagenomics.
title_full_unstemmed Dirichlet multinomial mixtures: generative models for microbial metagenomics.
title_short Dirichlet multinomial mixtures: generative models for microbial metagenomics.
title_sort dirichlet multinomial mixtures generative models for microbial metagenomics
url http://europepmc.org/articles/PMC3272020?pdf=render
work_keys_str_mv AT ianholmes dirichletmultinomialmixturesgenerativemodelsformicrobialmetagenomics
AT keithharris dirichletmultinomialmixturesgenerativemodelsformicrobialmetagenomics
AT christopherquince dirichletmultinomialmixturesgenerativemodelsformicrobialmetagenomics