Automatic disease prediction from human gut metagenomic data using boosting GraphSAGE

Abstract Background The human microbiome plays a critical role in maintaining human health. Due to the recent advances in high-throughput sequencing technologies, the microbiome profiles present in the human body have become publicly available. Hence, many works have been done to analyze human micro...

Full description

Bibliographic Details
Main Authors: K. Syama, J. Angel Arul Jothi, Namita Khanna
Format: Article
Language:English
Published: BMC 2023-03-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-023-05251-x
_version_ 1797451289577127936
author K. Syama
J. Angel Arul Jothi
Namita Khanna
author_facet K. Syama
J. Angel Arul Jothi
Namita Khanna
author_sort K. Syama
collection DOAJ
description Abstract Background The human microbiome plays a critical role in maintaining human health. Due to the recent advances in high-throughput sequencing technologies, the microbiome profiles present in the human body have become publicly available. Hence, many works have been done to analyze human microbiome profiles. These works have identified that different microbiome profiles are present in healthy and sick individuals for different diseases. Recently, several computational methods have utilized the microbiome profiles to automatically diagnose and classify the host phenotype. Results In this work, a novel deep learning framework based on boosting GraphSAGE is proposed for automatic prediction of diseases from metagenomic data. The proposed framework has two main components, (a). Metagenomic Disease graph (MD-graph) construction module, (b). Disease prediction Network (DP-Net) module. The graph construction module constructs a graph by considering each metagenomic sample as a node in the graph. The graph captures the relationship between the samples using a proximity measure. The DP-Net consists of a boosting GraphSAGE model which predicts the status of a sample as sick or healthy. The effectiveness of the proposed method is verified using real and synthetic datasets corresponding to diseases like inflammatory bowel disease and colorectal cancer. The proposed model achieved a highest AUC of 93%, Accuracy of 95%, F1-score of 95%, AUPRC of 95% for the real inflammatory bowel disease dataset and a best AUC of 90%, Accuracy of 91%, F1-score of 87% and AUPRC of 93% for the real colorectal cancer dataset. Conclusion The proposed framework outperforms other machine learning and deep learning models in terms of classification accuracy, AUC, F1-score and AUPRC for both synthetic and real metagenomic data.
first_indexed 2024-03-09T14:52:35Z
format Article
id doaj.art-b7af43096312434baed8c6ea118be121
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-03-09T14:52:35Z
publishDate 2023-03-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-b7af43096312434baed8c6ea118be1212023-11-26T14:23:25ZengBMCBMC Bioinformatics1471-21052023-03-0124112910.1186/s12859-023-05251-xAutomatic disease prediction from human gut metagenomic data using boosting GraphSAGEK. Syama0J. Angel Arul Jothi1Namita Khanna2Department of Computer Science, Birla Institute of Technology and Science Pilani Dubai CampusDepartment of Computer Science, Birla Institute of Technology and Science Pilani Dubai CampusDepartment of Biotechnology, Birla Institute of Technology and Science Pilani Dubai CampusAbstract Background The human microbiome plays a critical role in maintaining human health. Due to the recent advances in high-throughput sequencing technologies, the microbiome profiles present in the human body have become publicly available. Hence, many works have been done to analyze human microbiome profiles. These works have identified that different microbiome profiles are present in healthy and sick individuals for different diseases. Recently, several computational methods have utilized the microbiome profiles to automatically diagnose and classify the host phenotype. Results In this work, a novel deep learning framework based on boosting GraphSAGE is proposed for automatic prediction of diseases from metagenomic data. The proposed framework has two main components, (a). Metagenomic Disease graph (MD-graph) construction module, (b). Disease prediction Network (DP-Net) module. The graph construction module constructs a graph by considering each metagenomic sample as a node in the graph. The graph captures the relationship between the samples using a proximity measure. The DP-Net consists of a boosting GraphSAGE model which predicts the status of a sample as sick or healthy. The effectiveness of the proposed method is verified using real and synthetic datasets corresponding to diseases like inflammatory bowel disease and colorectal cancer. The proposed model achieved a highest AUC of 93%, Accuracy of 95%, F1-score of 95%, AUPRC of 95% for the real inflammatory bowel disease dataset and a best AUC of 90%, Accuracy of 91%, F1-score of 87% and AUPRC of 93% for the real colorectal cancer dataset. Conclusion The proposed framework outperforms other machine learning and deep learning models in terms of classification accuracy, AUC, F1-score and AUPRC for both synthetic and real metagenomic data.https://doi.org/10.1186/s12859-023-05251-xMetagenomicsDisease predictionEnsemble GNNGraphSAGEMachine learningDeep learning
spellingShingle K. Syama
J. Angel Arul Jothi
Namita Khanna
Automatic disease prediction from human gut metagenomic data using boosting GraphSAGE
BMC Bioinformatics
Metagenomics
Disease prediction
Ensemble GNN
GraphSAGE
Machine learning
Deep learning
title Automatic disease prediction from human gut metagenomic data using boosting GraphSAGE
title_full Automatic disease prediction from human gut metagenomic data using boosting GraphSAGE
title_fullStr Automatic disease prediction from human gut metagenomic data using boosting GraphSAGE
title_full_unstemmed Automatic disease prediction from human gut metagenomic data using boosting GraphSAGE
title_short Automatic disease prediction from human gut metagenomic data using boosting GraphSAGE
title_sort automatic disease prediction from human gut metagenomic data using boosting graphsage
topic Metagenomics
Disease prediction
Ensemble GNN
GraphSAGE
Machine learning
Deep learning
url https://doi.org/10.1186/s12859-023-05251-x
work_keys_str_mv AT ksyama automaticdiseasepredictionfromhumangutmetagenomicdatausingboostinggraphsage
AT jangelaruljothi automaticdiseasepredictionfromhumangutmetagenomicdatausingboostinggraphsage
AT namitakhanna automaticdiseasepredictionfromhumangutmetagenomicdatausingboostinggraphsage