A Bayesian method for identifying associations between response variables and bacterial community composition.

Determining associations between intestinal bacteria and continuously measured physiological outcomes is important for understanding the bacteria-host relationship but is not straightforward since abundance data (compositional data) are not normally distributed. To address this issue, we developed a...

Full description

Bibliographic Details
Main Authors: Adrian Verster, Nicholas Petronella, Judy Green, Fernando Matias, Stephen P J Brooks
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2022-07-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1010108
_version_ 1798039529508044800
author Adrian Verster
Nicholas Petronella
Judy Green
Fernando Matias
Stephen P J Brooks
author_facet Adrian Verster
Nicholas Petronella
Judy Green
Fernando Matias
Stephen P J Brooks
author_sort Adrian Verster
collection DOAJ
description Determining associations between intestinal bacteria and continuously measured physiological outcomes is important for understanding the bacteria-host relationship but is not straightforward since abundance data (compositional data) are not normally distributed. To address this issue, we developed a fully Bayesian linear regression model (BRACoD; Bayesian Regression Analysis of Compositional Data) with physiological measurements (continuous data) as a function of a matrix of relative bacterial abundances. Bacteria can be classified as operational taxonomic units or by taxonomy (genus, family, etc.). Bacteria associated with the physiological measurement were identified using a Bayesian variable selection method: Stochastic Search Variable Selection. The output is a list of inclusion probabilities ([Formula: see text]) and coefficients that indicate the strength of the association ([Formula: see text]) for each bacterial taxa. Tests with simulated communities showed that adopting a cut point value of [Formula: see text] ≥ 0.3 for identifying included bacteria optimized the true positive rate (TPR) while maintaining a false positive rate (FPR) of ≤ 5%. At this point, the chances of identifying non-contributing bacteria were low and all well-established contributors were included. Comparison with other methods showed that BRACoD (at [Formula: see text] ≥ 0.3) had higher precision and a higher TPR than a commonly used center log transformed LASSO procedure (clr-LASSO) as well as higher TPR than an off-the-shelf Spike and Slab method after center log transformation (clr-SS). BRACoD was also less likely to include non-contributing bacteria that merely correlate with contributing bacteria. Analysis of a rat microbiome experiment identified 47 operational taxonomic units that contributed to fecal butyrate levels. Of these, 31 were positively and 16 negatively associated with butyrate. Consistent with their known role in butyrate metabolism, most of these fell within the Lachnospiraceae and Ruminococcaceae. We conclude that BRACoD provides a more precise and accurate method for determining bacteria associated with a continuous physiological outcome compared to clr-LASSO. It is more sensitive than a generalized clr-SS algorithm, although it has a higher FPR. Its ability to distinguish genuine contributors from correlated bacteria makes it better suited to discriminating bacteria that directly contribute to an outcome. The algorithm corrects for the distortions arising from compositional data making it appropriate for analysis of microbiome data.
first_indexed 2024-04-11T21:55:09Z
format Article
id doaj.art-c111df3a15c543f2b6d69137493e732f
institution Directory Open Access Journal
issn 1553-734X
1553-7358
language English
last_indexed 2024-04-11T21:55:09Z
publishDate 2022-07-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj.art-c111df3a15c543f2b6d69137493e732f2022-12-22T04:01:07ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582022-07-01187e101010810.1371/journal.pcbi.1010108A Bayesian method for identifying associations between response variables and bacterial community composition.Adrian VersterNicholas PetronellaJudy GreenFernando MatiasStephen P J BrooksDetermining associations between intestinal bacteria and continuously measured physiological outcomes is important for understanding the bacteria-host relationship but is not straightforward since abundance data (compositional data) are not normally distributed. To address this issue, we developed a fully Bayesian linear regression model (BRACoD; Bayesian Regression Analysis of Compositional Data) with physiological measurements (continuous data) as a function of a matrix of relative bacterial abundances. Bacteria can be classified as operational taxonomic units or by taxonomy (genus, family, etc.). Bacteria associated with the physiological measurement were identified using a Bayesian variable selection method: Stochastic Search Variable Selection. The output is a list of inclusion probabilities ([Formula: see text]) and coefficients that indicate the strength of the association ([Formula: see text]) for each bacterial taxa. Tests with simulated communities showed that adopting a cut point value of [Formula: see text] ≥ 0.3 for identifying included bacteria optimized the true positive rate (TPR) while maintaining a false positive rate (FPR) of ≤ 5%. At this point, the chances of identifying non-contributing bacteria were low and all well-established contributors were included. Comparison with other methods showed that BRACoD (at [Formula: see text] ≥ 0.3) had higher precision and a higher TPR than a commonly used center log transformed LASSO procedure (clr-LASSO) as well as higher TPR than an off-the-shelf Spike and Slab method after center log transformation (clr-SS). BRACoD was also less likely to include non-contributing bacteria that merely correlate with contributing bacteria. Analysis of a rat microbiome experiment identified 47 operational taxonomic units that contributed to fecal butyrate levels. Of these, 31 were positively and 16 negatively associated with butyrate. Consistent with their known role in butyrate metabolism, most of these fell within the Lachnospiraceae and Ruminococcaceae. We conclude that BRACoD provides a more precise and accurate method for determining bacteria associated with a continuous physiological outcome compared to clr-LASSO. It is more sensitive than a generalized clr-SS algorithm, although it has a higher FPR. Its ability to distinguish genuine contributors from correlated bacteria makes it better suited to discriminating bacteria that directly contribute to an outcome. The algorithm corrects for the distortions arising from compositional data making it appropriate for analysis of microbiome data.https://doi.org/10.1371/journal.pcbi.1010108
spellingShingle Adrian Verster
Nicholas Petronella
Judy Green
Fernando Matias
Stephen P J Brooks
A Bayesian method for identifying associations between response variables and bacterial community composition.
PLoS Computational Biology
title A Bayesian method for identifying associations between response variables and bacterial community composition.
title_full A Bayesian method for identifying associations between response variables and bacterial community composition.
title_fullStr A Bayesian method for identifying associations between response variables and bacterial community composition.
title_full_unstemmed A Bayesian method for identifying associations between response variables and bacterial community composition.
title_short A Bayesian method for identifying associations between response variables and bacterial community composition.
title_sort bayesian method for identifying associations between response variables and bacterial community composition
url https://doi.org/10.1371/journal.pcbi.1010108
work_keys_str_mv AT adrianverster abayesianmethodforidentifyingassociationsbetweenresponsevariablesandbacterialcommunitycomposition
AT nicholaspetronella abayesianmethodforidentifyingassociationsbetweenresponsevariablesandbacterialcommunitycomposition
AT judygreen abayesianmethodforidentifyingassociationsbetweenresponsevariablesandbacterialcommunitycomposition
AT fernandomatias abayesianmethodforidentifyingassociationsbetweenresponsevariablesandbacterialcommunitycomposition
AT stephenpjbrooks abayesianmethodforidentifyingassociationsbetweenresponsevariablesandbacterialcommunitycomposition
AT adrianverster bayesianmethodforidentifyingassociationsbetweenresponsevariablesandbacterialcommunitycomposition
AT nicholaspetronella bayesianmethodforidentifyingassociationsbetweenresponsevariablesandbacterialcommunitycomposition
AT judygreen bayesianmethodforidentifyingassociationsbetweenresponsevariablesandbacterialcommunitycomposition
AT fernandomatias bayesianmethodforidentifyingassociationsbetweenresponsevariablesandbacterialcommunitycomposition
AT stephenpjbrooks bayesianmethodforidentifyingassociationsbetweenresponsevariablesandbacterialcommunitycomposition