A mixed-model approach for estimating drivers of microbiota community composition and differential taxonomic abundance

ABSTRACT Next-generation sequencing (NGS) and metabarcoding approaches are increasingly applied to wild animal populations, but there is a disconnect between the widely applied generalized linear mixed model (GLMM) approaches commonly used to study phenotypic variation and the statistical toolkit fr...

Full description

Bibliographic Details
Main Authors: Amy R. Sweeny, Hannah Lemon, Anan Ibrahim, Kathryn A. Watt, Kenneth Wilson, Dylan Z. Childs, Daniel H. Nussey, Andrew Free, Luke McNally
Format: Article
Language:English
Published: American Society for Microbiology 2023-08-01
Series:mSystems
Subjects:
Online Access:https://journals.asm.org/doi/10.1128/msystems.00040-23
_version_ 1797730299216396288
author Amy R. Sweeny
Hannah Lemon
Anan Ibrahim
Kathryn A. Watt
Kenneth Wilson
Dylan Z. Childs
Daniel H. Nussey
Andrew Free
Luke McNally
author_facet Amy R. Sweeny
Hannah Lemon
Anan Ibrahim
Kathryn A. Watt
Kenneth Wilson
Dylan Z. Childs
Daniel H. Nussey
Andrew Free
Luke McNally
author_sort Amy R. Sweeny
collection DOAJ
description ABSTRACT Next-generation sequencing (NGS) and metabarcoding approaches are increasingly applied to wild animal populations, but there is a disconnect between the widely applied generalized linear mixed model (GLMM) approaches commonly used to study phenotypic variation and the statistical toolkit from community ecology typically applied to metabarcoding data. Here, we describe the suitability of a novel GLMM-based approach for analyzing the taxon-specific sequence read counts derived from standard metabarcoding data. This approach allows decomposition of the contribution of different drivers to variation in community composition (e.g., age, season, individual) via interaction terms in the model random-effects structure. We provide guidance to implementing this approach and show how these models can identify how responsible specific taxonomic groups are for the effects attributed to different drivers. We applied this approach to two cross-sectional data sets from the Soay sheep population of St. Kilda. GLMMs showed agreement with dissimilarity-based approaches highlighting the substantial contribution of age and minimal contribution of season to microbiota community compositions, and simultaneously estimated the contribution of other technical and biological factors. We further used model predictions to show that age effects were principally due to increases in taxa of the phylum Bacteroidetes and declines in taxa of the phylum Firmicutes. This approach offers a powerful means for understanding the influence of drivers of community structure derived from metabarcoding data. We discuss how our approach could be readily adapted to allow researchers to estimate contributions of additional factors such as host or microbe phylogeny to answer emerging questions surrounding the ecological and evolutionary roles of within-host communities. IMPORTANCE NGS and fecal metabarcoding methods have provided powerful opportunities to study the wild gut microbiome. A wealth of data is, therefore, amassing across wild systems, generating the need for analytical approaches that can appropriately investigate simultaneous factors at the host and environmental scale that determine the composition of these communities. Here, we describe a generalized linear mixed-effects model (GLMM) approach to analyze read count data from metabarcoding of the gut microbiota, allowing us to quantify the contributions of multiple host and environmental factors to within-host community structure. Our approach provides outputs that are familiar to a majority of field ecologists and can be run using any standard mixed-effects modeling packages. We illustrate this approach using two metabarcoding data sets from the Soay sheep population of St. Kilda investigating age and season effects as worked examples.
first_indexed 2024-03-12T11:42:15Z
format Article
id doaj.art-1363b446e3f543a793e211309e9fae50
institution Directory Open Access Journal
issn 2379-5077
language English
last_indexed 2024-03-12T11:42:15Z
publishDate 2023-08-01
publisher American Society for Microbiology
record_format Article
series mSystems
spelling doaj.art-1363b446e3f543a793e211309e9fae502023-08-31T13:00:43ZengAmerican Society for MicrobiologymSystems2379-50772023-08-018410.1128/msystems.00040-23A mixed-model approach for estimating drivers of microbiota community composition and differential taxonomic abundanceAmy R. Sweeny0Hannah Lemon1Anan Ibrahim2Kathryn A. Watt3Kenneth Wilson4Dylan Z. Childs5Daniel H. Nussey6Andrew Free7Luke McNally8Institute of Ecology & Evolution, University of Edinburgh , Edinburgh, United KingdomInstitute of Ecology & Evolution, University of Edinburgh , Edinburgh, United KingdomBiochemistry and Biotechnology, Institute of Quantitative Biology, University of Edinburgh , Edinburgh, United KingdomInstitute of Ecology & Evolution, University of Edinburgh , Edinburgh, United KingdomLancaster Environment Centre, Lancaster University , Lancaster, United KingdomSchool of Biosciences, University of Sheffield , Sheffield, United KingdomInstitute of Ecology & Evolution, University of Edinburgh , Edinburgh, United KingdomBiochemistry and Biotechnology, Institute of Quantitative Biology, University of Edinburgh , Edinburgh, United KingdomInstitute of Ecology & Evolution, University of Edinburgh , Edinburgh, United KingdomABSTRACT Next-generation sequencing (NGS) and metabarcoding approaches are increasingly applied to wild animal populations, but there is a disconnect between the widely applied generalized linear mixed model (GLMM) approaches commonly used to study phenotypic variation and the statistical toolkit from community ecology typically applied to metabarcoding data. Here, we describe the suitability of a novel GLMM-based approach for analyzing the taxon-specific sequence read counts derived from standard metabarcoding data. This approach allows decomposition of the contribution of different drivers to variation in community composition (e.g., age, season, individual) via interaction terms in the model random-effects structure. We provide guidance to implementing this approach and show how these models can identify how responsible specific taxonomic groups are for the effects attributed to different drivers. We applied this approach to two cross-sectional data sets from the Soay sheep population of St. Kilda. GLMMs showed agreement with dissimilarity-based approaches highlighting the substantial contribution of age and minimal contribution of season to microbiota community compositions, and simultaneously estimated the contribution of other technical and biological factors. We further used model predictions to show that age effects were principally due to increases in taxa of the phylum Bacteroidetes and declines in taxa of the phylum Firmicutes. This approach offers a powerful means for understanding the influence of drivers of community structure derived from metabarcoding data. We discuss how our approach could be readily adapted to allow researchers to estimate contributions of additional factors such as host or microbe phylogeny to answer emerging questions surrounding the ecological and evolutionary roles of within-host communities. IMPORTANCE NGS and fecal metabarcoding methods have provided powerful opportunities to study the wild gut microbiome. A wealth of data is, therefore, amassing across wild systems, generating the need for analytical approaches that can appropriately investigate simultaneous factors at the host and environmental scale that determine the composition of these communities. Here, we describe a generalized linear mixed-effects model (GLMM) approach to analyze read count data from metabarcoding of the gut microbiota, allowing us to quantify the contributions of multiple host and environmental factors to within-host community structure. Our approach provides outputs that are familiar to a majority of field ecologists and can be run using any standard mixed-effects modeling packages. We illustrate this approach using two metabarcoding data sets from the Soay sheep population of St. Kilda investigating age and season effects as worked examples.https://journals.asm.org/doi/10.1128/msystems.00040-23microbiotametabarcoding16Samplicon sequence variantsgeneralized linear mixed-effects modelcommunity composition
spellingShingle Amy R. Sweeny
Hannah Lemon
Anan Ibrahim
Kathryn A. Watt
Kenneth Wilson
Dylan Z. Childs
Daniel H. Nussey
Andrew Free
Luke McNally
A mixed-model approach for estimating drivers of microbiota community composition and differential taxonomic abundance
mSystems
microbiota
metabarcoding
16S
amplicon sequence variants
generalized linear mixed-effects model
community composition
title A mixed-model approach for estimating drivers of microbiota community composition and differential taxonomic abundance
title_full A mixed-model approach for estimating drivers of microbiota community composition and differential taxonomic abundance
title_fullStr A mixed-model approach for estimating drivers of microbiota community composition and differential taxonomic abundance
title_full_unstemmed A mixed-model approach for estimating drivers of microbiota community composition and differential taxonomic abundance
title_short A mixed-model approach for estimating drivers of microbiota community composition and differential taxonomic abundance
title_sort mixed model approach for estimating drivers of microbiota community composition and differential taxonomic abundance
topic microbiota
metabarcoding
16S
amplicon sequence variants
generalized linear mixed-effects model
community composition
url https://journals.asm.org/doi/10.1128/msystems.00040-23
work_keys_str_mv AT amyrsweeny amixedmodelapproachforestimatingdriversofmicrobiotacommunitycompositionanddifferentialtaxonomicabundance
AT hannahlemon amixedmodelapproachforestimatingdriversofmicrobiotacommunitycompositionanddifferentialtaxonomicabundance
AT ananibrahim amixedmodelapproachforestimatingdriversofmicrobiotacommunitycompositionanddifferentialtaxonomicabundance
AT kathrynawatt amixedmodelapproachforestimatingdriversofmicrobiotacommunitycompositionanddifferentialtaxonomicabundance
AT kennethwilson amixedmodelapproachforestimatingdriversofmicrobiotacommunitycompositionanddifferentialtaxonomicabundance
AT dylanzchilds amixedmodelapproachforestimatingdriversofmicrobiotacommunitycompositionanddifferentialtaxonomicabundance
AT danielhnussey amixedmodelapproachforestimatingdriversofmicrobiotacommunitycompositionanddifferentialtaxonomicabundance
AT andrewfree amixedmodelapproachforestimatingdriversofmicrobiotacommunitycompositionanddifferentialtaxonomicabundance
AT lukemcnally amixedmodelapproachforestimatingdriversofmicrobiotacommunitycompositionanddifferentialtaxonomicabundance
AT amyrsweeny mixedmodelapproachforestimatingdriversofmicrobiotacommunitycompositionanddifferentialtaxonomicabundance
AT hannahlemon mixedmodelapproachforestimatingdriversofmicrobiotacommunitycompositionanddifferentialtaxonomicabundance
AT ananibrahim mixedmodelapproachforestimatingdriversofmicrobiotacommunitycompositionanddifferentialtaxonomicabundance
AT kathrynawatt mixedmodelapproachforestimatingdriversofmicrobiotacommunitycompositionanddifferentialtaxonomicabundance
AT kennethwilson mixedmodelapproachforestimatingdriversofmicrobiotacommunitycompositionanddifferentialtaxonomicabundance
AT dylanzchilds mixedmodelapproachforestimatingdriversofmicrobiotacommunitycompositionanddifferentialtaxonomicabundance
AT danielhnussey mixedmodelapproachforestimatingdriversofmicrobiotacommunitycompositionanddifferentialtaxonomicabundance
AT andrewfree mixedmodelapproachforestimatingdriversofmicrobiotacommunitycompositionanddifferentialtaxonomicabundance
AT lukemcnally mixedmodelapproachforestimatingdriversofmicrobiotacommunitycompositionanddifferentialtaxonomicabundance