Kernel principal components based cascade forest towards disease identification with human microbiota

Abstract Background Numerous pieces of clinical evidence have shown that many phenotypic traits of human disease are related to their gut microbiome, i.e., inflammation, obesity, HIV, and diabetes. Through supervised classification, it is feasible to determine the human disease states by revealing t...

Full description

Bibliographic Details
Main Authors: Jiayu Zhou, Yanqing Ye, Jiang Jiang
Format: Article
Language:English
Published: BMC 2021-12-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:https://doi.org/10.1186/s12911-021-01705-5
_version_ 1819097846808313856
author Jiayu Zhou
Yanqing Ye
Jiang Jiang
author_facet Jiayu Zhou
Yanqing Ye
Jiang Jiang
author_sort Jiayu Zhou
collection DOAJ
description Abstract Background Numerous pieces of clinical evidence have shown that many phenotypic traits of human disease are related to their gut microbiome, i.e., inflammation, obesity, HIV, and diabetes. Through supervised classification, it is feasible to determine the human disease states by revealing the intestinal microbiota compositional information. However, the abundance matrix of microbiome data is so sparse, an interpretable deep model is crucial to further represent and mine the data for expansion, such as the deep forest model. What’s more, overfitting can still exist in the original deep forest model when dealing with such “large p, small n” biology data. Feature reduction is considered to improve the ensemble forest model especially towards the disease identification in the human microbiota. Methods In this work, we propose the kernel principal components based cascade forest method, so-called KPCCF, to classify the disease states of patients by using taxonomic profiles of the microbiome at the family level. In detail, the kernel principal components analysis method is first used to reduce the original dimension of human microbiota datasets. Besides, the processed data is fed into the cascade forest to preliminarily discriminate against the disease state of the samples. Results The proposed KPCCF algorithm can represent the small-scale and high-dimension human microbiota datasets with the sparse feature matrix. Systematic comparison experiments demonstrate that our method consistently outperforms the state-of-the-art methods with the comparative study on 4 datasets. Conclusion Despite sharing some common characteristics, a one-size-fits-all solution does not exist in any space. The traditional depth model has limitations in the biological application of the unbalanced scale between small samples and high dimensions. KPCCF distinguishes from the standard deep forest model for its excellent performance in the microbiota field. Additionally, compared to other dimensionality reduction methods, the kernel principal components analysis method is more suitable for microbiota datasets.
first_indexed 2024-12-22T00:21:35Z
format Article
id doaj.art-fa7b501cfb124966b5f5ec96106a598c
institution Directory Open Access Journal
issn 1472-6947
language English
last_indexed 2024-12-22T00:21:35Z
publishDate 2021-12-01
publisher BMC
record_format Article
series BMC Medical Informatics and Decision Making
spelling doaj.art-fa7b501cfb124966b5f5ec96106a598c2022-12-21T18:45:09ZengBMCBMC Medical Informatics and Decision Making1472-69472021-12-0121111510.1186/s12911-021-01705-5Kernel principal components based cascade forest towards disease identification with human microbiotaJiayu Zhou0Yanqing Ye1Jiang Jiang2National University of Defense TechnologyConsulting Center for Strategic Assessment, Academy of Military SciencesNational University of Defense TechnologyAbstract Background Numerous pieces of clinical evidence have shown that many phenotypic traits of human disease are related to their gut microbiome, i.e., inflammation, obesity, HIV, and diabetes. Through supervised classification, it is feasible to determine the human disease states by revealing the intestinal microbiota compositional information. However, the abundance matrix of microbiome data is so sparse, an interpretable deep model is crucial to further represent and mine the data for expansion, such as the deep forest model. What’s more, overfitting can still exist in the original deep forest model when dealing with such “large p, small n” biology data. Feature reduction is considered to improve the ensemble forest model especially towards the disease identification in the human microbiota. Methods In this work, we propose the kernel principal components based cascade forest method, so-called KPCCF, to classify the disease states of patients by using taxonomic profiles of the microbiome at the family level. In detail, the kernel principal components analysis method is first used to reduce the original dimension of human microbiota datasets. Besides, the processed data is fed into the cascade forest to preliminarily discriminate against the disease state of the samples. Results The proposed KPCCF algorithm can represent the small-scale and high-dimension human microbiota datasets with the sparse feature matrix. Systematic comparison experiments demonstrate that our method consistently outperforms the state-of-the-art methods with the comparative study on 4 datasets. Conclusion Despite sharing some common characteristics, a one-size-fits-all solution does not exist in any space. The traditional depth model has limitations in the biological application of the unbalanced scale between small samples and high dimensions. KPCCF distinguishes from the standard deep forest model for its excellent performance in the microbiota field. Additionally, compared to other dimensionality reduction methods, the kernel principal components analysis method is more suitable for microbiota datasets.https://doi.org/10.1186/s12911-021-01705-5Human microbiotaSupervised classificationKernel principal componentsCascade forestDisease identification
spellingShingle Jiayu Zhou
Yanqing Ye
Jiang Jiang
Kernel principal components based cascade forest towards disease identification with human microbiota
BMC Medical Informatics and Decision Making
Human microbiota
Supervised classification
Kernel principal components
Cascade forest
Disease identification
title Kernel principal components based cascade forest towards disease identification with human microbiota
title_full Kernel principal components based cascade forest towards disease identification with human microbiota
title_fullStr Kernel principal components based cascade forest towards disease identification with human microbiota
title_full_unstemmed Kernel principal components based cascade forest towards disease identification with human microbiota
title_short Kernel principal components based cascade forest towards disease identification with human microbiota
title_sort kernel principal components based cascade forest towards disease identification with human microbiota
topic Human microbiota
Supervised classification
Kernel principal components
Cascade forest
Disease identification
url https://doi.org/10.1186/s12911-021-01705-5
work_keys_str_mv AT jiayuzhou kernelprincipalcomponentsbasedcascadeforesttowardsdiseaseidentificationwithhumanmicrobiota
AT yanqingye kernelprincipalcomponentsbasedcascadeforesttowardsdiseaseidentificationwithhumanmicrobiota
AT jiangjiang kernelprincipalcomponentsbasedcascadeforesttowardsdiseaseidentificationwithhumanmicrobiota