A Bayesian method for identifying associations between response variables and bacterial community composition.
Determining associations between intestinal bacteria and continuously measured physiological outcomes is important for understanding the bacteria-host relationship but is not straightforward since abundance data (compositional data) are not normally distributed. To address this issue, we developed a...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2022-07-01
|
Series: | PLoS Computational Biology |
Online Access: | https://doi.org/10.1371/journal.pcbi.1010108 |
_version_ | 1798039529508044800 |
---|---|
author | Adrian Verster Nicholas Petronella Judy Green Fernando Matias Stephen P J Brooks |
author_facet | Adrian Verster Nicholas Petronella Judy Green Fernando Matias Stephen P J Brooks |
author_sort | Adrian Verster |
collection | DOAJ |
description | Determining associations between intestinal bacteria and continuously measured physiological outcomes is important for understanding the bacteria-host relationship but is not straightforward since abundance data (compositional data) are not normally distributed. To address this issue, we developed a fully Bayesian linear regression model (BRACoD; Bayesian Regression Analysis of Compositional Data) with physiological measurements (continuous data) as a function of a matrix of relative bacterial abundances. Bacteria can be classified as operational taxonomic units or by taxonomy (genus, family, etc.). Bacteria associated with the physiological measurement were identified using a Bayesian variable selection method: Stochastic Search Variable Selection. The output is a list of inclusion probabilities ([Formula: see text]) and coefficients that indicate the strength of the association ([Formula: see text]) for each bacterial taxa. Tests with simulated communities showed that adopting a cut point value of [Formula: see text] ≥ 0.3 for identifying included bacteria optimized the true positive rate (TPR) while maintaining a false positive rate (FPR) of ≤ 5%. At this point, the chances of identifying non-contributing bacteria were low and all well-established contributors were included. Comparison with other methods showed that BRACoD (at [Formula: see text] ≥ 0.3) had higher precision and a higher TPR than a commonly used center log transformed LASSO procedure (clr-LASSO) as well as higher TPR than an off-the-shelf Spike and Slab method after center log transformation (clr-SS). BRACoD was also less likely to include non-contributing bacteria that merely correlate with contributing bacteria. Analysis of a rat microbiome experiment identified 47 operational taxonomic units that contributed to fecal butyrate levels. Of these, 31 were positively and 16 negatively associated with butyrate. Consistent with their known role in butyrate metabolism, most of these fell within the Lachnospiraceae and Ruminococcaceae. We conclude that BRACoD provides a more precise and accurate method for determining bacteria associated with a continuous physiological outcome compared to clr-LASSO. It is more sensitive than a generalized clr-SS algorithm, although it has a higher FPR. Its ability to distinguish genuine contributors from correlated bacteria makes it better suited to discriminating bacteria that directly contribute to an outcome. The algorithm corrects for the distortions arising from compositional data making it appropriate for analysis of microbiome data. |
first_indexed | 2024-04-11T21:55:09Z |
format | Article |
id | doaj.art-c111df3a15c543f2b6d69137493e732f |
institution | Directory Open Access Journal |
issn | 1553-734X 1553-7358 |
language | English |
last_indexed | 2024-04-11T21:55:09Z |
publishDate | 2022-07-01 |
publisher | Public Library of Science (PLoS) |
record_format | Article |
series | PLoS Computational Biology |
spelling | doaj.art-c111df3a15c543f2b6d69137493e732f2022-12-22T04:01:07ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582022-07-01187e101010810.1371/journal.pcbi.1010108A Bayesian method for identifying associations between response variables and bacterial community composition.Adrian VersterNicholas PetronellaJudy GreenFernando MatiasStephen P J BrooksDetermining associations between intestinal bacteria and continuously measured physiological outcomes is important for understanding the bacteria-host relationship but is not straightforward since abundance data (compositional data) are not normally distributed. To address this issue, we developed a fully Bayesian linear regression model (BRACoD; Bayesian Regression Analysis of Compositional Data) with physiological measurements (continuous data) as a function of a matrix of relative bacterial abundances. Bacteria can be classified as operational taxonomic units or by taxonomy (genus, family, etc.). Bacteria associated with the physiological measurement were identified using a Bayesian variable selection method: Stochastic Search Variable Selection. The output is a list of inclusion probabilities ([Formula: see text]) and coefficients that indicate the strength of the association ([Formula: see text]) for each bacterial taxa. Tests with simulated communities showed that adopting a cut point value of [Formula: see text] ≥ 0.3 for identifying included bacteria optimized the true positive rate (TPR) while maintaining a false positive rate (FPR) of ≤ 5%. At this point, the chances of identifying non-contributing bacteria were low and all well-established contributors were included. Comparison with other methods showed that BRACoD (at [Formula: see text] ≥ 0.3) had higher precision and a higher TPR than a commonly used center log transformed LASSO procedure (clr-LASSO) as well as higher TPR than an off-the-shelf Spike and Slab method after center log transformation (clr-SS). BRACoD was also less likely to include non-contributing bacteria that merely correlate with contributing bacteria. Analysis of a rat microbiome experiment identified 47 operational taxonomic units that contributed to fecal butyrate levels. Of these, 31 were positively and 16 negatively associated with butyrate. Consistent with their known role in butyrate metabolism, most of these fell within the Lachnospiraceae and Ruminococcaceae. We conclude that BRACoD provides a more precise and accurate method for determining bacteria associated with a continuous physiological outcome compared to clr-LASSO. It is more sensitive than a generalized clr-SS algorithm, although it has a higher FPR. Its ability to distinguish genuine contributors from correlated bacteria makes it better suited to discriminating bacteria that directly contribute to an outcome. The algorithm corrects for the distortions arising from compositional data making it appropriate for analysis of microbiome data.https://doi.org/10.1371/journal.pcbi.1010108 |
spellingShingle | Adrian Verster Nicholas Petronella Judy Green Fernando Matias Stephen P J Brooks A Bayesian method for identifying associations between response variables and bacterial community composition. PLoS Computational Biology |
title | A Bayesian method for identifying associations between response variables and bacterial community composition. |
title_full | A Bayesian method for identifying associations between response variables and bacterial community composition. |
title_fullStr | A Bayesian method for identifying associations between response variables and bacterial community composition. |
title_full_unstemmed | A Bayesian method for identifying associations between response variables and bacterial community composition. |
title_short | A Bayesian method for identifying associations between response variables and bacterial community composition. |
title_sort | bayesian method for identifying associations between response variables and bacterial community composition |
url | https://doi.org/10.1371/journal.pcbi.1010108 |
work_keys_str_mv | AT adrianverster abayesianmethodforidentifyingassociationsbetweenresponsevariablesandbacterialcommunitycomposition AT nicholaspetronella abayesianmethodforidentifyingassociationsbetweenresponsevariablesandbacterialcommunitycomposition AT judygreen abayesianmethodforidentifyingassociationsbetweenresponsevariablesandbacterialcommunitycomposition AT fernandomatias abayesianmethodforidentifyingassociationsbetweenresponsevariablesandbacterialcommunitycomposition AT stephenpjbrooks abayesianmethodforidentifyingassociationsbetweenresponsevariablesandbacterialcommunitycomposition AT adrianverster bayesianmethodforidentifyingassociationsbetweenresponsevariablesandbacterialcommunitycomposition AT nicholaspetronella bayesianmethodforidentifyingassociationsbetweenresponsevariablesandbacterialcommunitycomposition AT judygreen bayesianmethodforidentifyingassociationsbetweenresponsevariablesandbacterialcommunitycomposition AT fernandomatias bayesianmethodforidentifyingassociationsbetweenresponsevariablesandbacterialcommunitycomposition AT stephenpjbrooks bayesianmethodforidentifyingassociationsbetweenresponsevariablesandbacterialcommunitycomposition |