A Phylogeny-Regularized Sparse Regression Model for Predictive Modeling of Microbial Community Data
Fueled by technological advancement, there has been a surge of human microbiome studies surveying the microbial communities associated with the human body and their links with health and disease. As a complement to the human genome, the human microbiome holds great potential for precision medicine....
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2018-12-01
|
Series: | Frontiers in Microbiology |
Subjects: | |
Online Access: | https://www.frontiersin.org/article/10.3389/fmicb.2018.03112/full |
_version_ | 1828421798869336064 |
---|---|
author | Jian Xiao Jian Xiao Li Chen Yue Yu Xianyang Zhang Jun Chen |
author_facet | Jian Xiao Jian Xiao Li Chen Yue Yu Xianyang Zhang Jun Chen |
author_sort | Jian Xiao |
collection | DOAJ |
description | Fueled by technological advancement, there has been a surge of human microbiome studies surveying the microbial communities associated with the human body and their links with health and disease. As a complement to the human genome, the human microbiome holds great potential for precision medicine. Efficient predictive models based on microbiome data could be potentially used in various clinical applications such as disease diagnosis, patient stratification and drug response prediction. One important characteristic of the microbial community data is the phylogenetic tree that relates all the microbial taxa based on their evolutionary history. The phylogenetic tree is an informative prior for more efficient prediction since the microbial community changes are usually not randomly distributed on the tree but tend to occur in clades at varying phylogenetic depths (clustered signal). Although community-wide changes are possible for some conditions, it is also likely that the community changes are only associated with a small subset of “marker” taxa (sparse signal). Unfortunately, predictive models of microbial community data taking into account both the sparsity and the tree structure remain under-developed. In this paper, we propose a predictive framework to exploit sparse and clustered microbiome signals using a phylogeny-regularized sparse regression model. Our approach is motivated by evolutionary theory, where a natural correlation structure among microbial taxa exists according to the phylogenetic relationship. A novel phylogeny-based smoothness penalty is proposed to smooth the coefficients of the microbial taxa with respect to the phylogenetic tree. Using simulated and real datasets, we show that our method achieves better prediction performance than competing sparse regression methods for sparse and clustered microbiome signals. |
first_indexed | 2024-12-10T15:38:02Z |
format | Article |
id | doaj.art-a2b3147a66934a92a9c04940f9805f99 |
institution | Directory Open Access Journal |
issn | 1664-302X |
language | English |
last_indexed | 2024-12-10T15:38:02Z |
publishDate | 2018-12-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Microbiology |
spelling | doaj.art-a2b3147a66934a92a9c04940f9805f992022-12-22T01:43:11ZengFrontiers Media S.A.Frontiers in Microbiology1664-302X2018-12-01910.3389/fmicb.2018.03112422587A Phylogeny-Regularized Sparse Regression Model for Predictive Modeling of Microbial Community DataJian Xiao0Jian Xiao1Li Chen2Yue Yu3Xianyang Zhang4Jun Chen5Division of Biomedical Statistics and Informatics, Center for Individualized Medicine, Mayo ClinicRochester, MN, United StatesSchool of Statistics and MathematicsZhongnan University of Economics and Law, Wuhan, ChinaDepartment of Health Outcomes Research and Policy, Harrison School of Pharmacy, Auburn UniversityAuburn, AL, United StatesDivision of Biomedical Statistics and Informatics, Center for Individualized Medicine, Mayo ClinicRochester, MN, United StatesDepartment of Statistics, Texas A&M UniversityCollege Station, TX, United StatesDivision of Biomedical Statistics and Informatics, Center for Individualized Medicine, Mayo ClinicRochester, MN, United StatesFueled by technological advancement, there has been a surge of human microbiome studies surveying the microbial communities associated with the human body and their links with health and disease. As a complement to the human genome, the human microbiome holds great potential for precision medicine. Efficient predictive models based on microbiome data could be potentially used in various clinical applications such as disease diagnosis, patient stratification and drug response prediction. One important characteristic of the microbial community data is the phylogenetic tree that relates all the microbial taxa based on their evolutionary history. The phylogenetic tree is an informative prior for more efficient prediction since the microbial community changes are usually not randomly distributed on the tree but tend to occur in clades at varying phylogenetic depths (clustered signal). Although community-wide changes are possible for some conditions, it is also likely that the community changes are only associated with a small subset of “marker” taxa (sparse signal). Unfortunately, predictive models of microbial community data taking into account both the sparsity and the tree structure remain under-developed. In this paper, we propose a predictive framework to exploit sparse and clustered microbiome signals using a phylogeny-regularized sparse regression model. Our approach is motivated by evolutionary theory, where a natural correlation structure among microbial taxa exists according to the phylogenetic relationship. A novel phylogeny-based smoothness penalty is proposed to smooth the coefficients of the microbial taxa with respect to the phylogenetic tree. Using simulated and real datasets, we show that our method achieves better prediction performance than competing sparse regression methods for sparse and clustered microbiome signals.https://www.frontiersin.org/article/10.3389/fmicb.2018.03112/fullmicrobiomephylogenetic treesparse generalized linear modelpredictive modelstatistical modelinghigh-dimenisonal statistics |
spellingShingle | Jian Xiao Jian Xiao Li Chen Yue Yu Xianyang Zhang Jun Chen A Phylogeny-Regularized Sparse Regression Model for Predictive Modeling of Microbial Community Data Frontiers in Microbiology microbiome phylogenetic tree sparse generalized linear model predictive model statistical modeling high-dimenisonal statistics |
title | A Phylogeny-Regularized Sparse Regression Model for Predictive Modeling of Microbial Community Data |
title_full | A Phylogeny-Regularized Sparse Regression Model for Predictive Modeling of Microbial Community Data |
title_fullStr | A Phylogeny-Regularized Sparse Regression Model for Predictive Modeling of Microbial Community Data |
title_full_unstemmed | A Phylogeny-Regularized Sparse Regression Model for Predictive Modeling of Microbial Community Data |
title_short | A Phylogeny-Regularized Sparse Regression Model for Predictive Modeling of Microbial Community Data |
title_sort | phylogeny regularized sparse regression model for predictive modeling of microbial community data |
topic | microbiome phylogenetic tree sparse generalized linear model predictive model statistical modeling high-dimenisonal statistics |
url | https://www.frontiersin.org/article/10.3389/fmicb.2018.03112/full |
work_keys_str_mv | AT jianxiao aphylogenyregularizedsparseregressionmodelforpredictivemodelingofmicrobialcommunitydata AT jianxiao aphylogenyregularizedsparseregressionmodelforpredictivemodelingofmicrobialcommunitydata AT lichen aphylogenyregularizedsparseregressionmodelforpredictivemodelingofmicrobialcommunitydata AT yueyu aphylogenyregularizedsparseregressionmodelforpredictivemodelingofmicrobialcommunitydata AT xianyangzhang aphylogenyregularizedsparseregressionmodelforpredictivemodelingofmicrobialcommunitydata AT junchen aphylogenyregularizedsparseregressionmodelforpredictivemodelingofmicrobialcommunitydata AT jianxiao phylogenyregularizedsparseregressionmodelforpredictivemodelingofmicrobialcommunitydata AT jianxiao phylogenyregularizedsparseregressionmodelforpredictivemodelingofmicrobialcommunitydata AT lichen phylogenyregularizedsparseregressionmodelforpredictivemodelingofmicrobialcommunitydata AT yueyu phylogenyregularizedsparseregressionmodelforpredictivemodelingofmicrobialcommunitydata AT xianyangzhang phylogenyregularizedsparseregressionmodelforpredictivemodelingofmicrobialcommunitydata AT junchen phylogenyregularizedsparseregressionmodelforpredictivemodelingofmicrobialcommunitydata |