A bacterial sensor taxonomy across earth ecosystems for machine learning applications

ABSTRACTMicrobial communities have evolved to colonize all ecosystems of the planet, from the deep sea to the human gut. Microbes survive by sensing, responding, and adapting to immediate environmental cues. This process is driven by signal transduction proteins such as histidine kinases, which use...

Full description

Bibliographic Details
Main Authors: Helen Park, Marcin P. Joachimiak, Sean P. Jungbluth, Ziming Yang, William J. Riehl, R. Shane Canon, Adam P. Arkin, Paramvir S. Dehal
Format: Article
Language:English
Published: American Society for Microbiology 2024-01-01
Series:mSystems
Subjects:
Online Access:https://journals.asm.org/doi/10.1128/msystems.00026-23
_version_ 1827376003219980288
author Helen Park
Marcin P. Joachimiak
Sean P. Jungbluth
Ziming Yang
William J. Riehl
R. Shane Canon
Adam P. Arkin
Paramvir S. Dehal
author_facet Helen Park
Marcin P. Joachimiak
Sean P. Jungbluth
Ziming Yang
William J. Riehl
R. Shane Canon
Adam P. Arkin
Paramvir S. Dehal
author_sort Helen Park
collection DOAJ
description ABSTRACTMicrobial communities have evolved to colonize all ecosystems of the planet, from the deep sea to the human gut. Microbes survive by sensing, responding, and adapting to immediate environmental cues. This process is driven by signal transduction proteins such as histidine kinases, which use their sensing domains to bind or otherwise detect environmental cues and “transduce” signals to adjust internal processes. We hypothesized that an ecosystem’s unique stimuli leave a sensor “fingerprint,” able to identify and shed insight on ecosystem conditions. To test this, we collected 20,712 publicly available metagenomes from Host-associated, Environmental, and Engineered ecosystems across the globe. We extracted and clustered the collection’s nearly 18M unique sensory domains into 113,712 similar groupings with MMseqs2. We built gradient-boosted decision tree machine learning models and found we could classify the ecosystem type (accuracy: 87%) and predict the levels of different physical parameters (R2 score: 83%) using the sensor cluster abundance as features. Feature importance enables identification of the most predictive sensors to differentiate between ecosystems which can lead to mechanistic interpretations if the sensor domains are well annotated. To demonstrate this, a machine learning model was trained to predict patient’s disease state and used to identify domains related to oxygen sensing present in a healthy gut but missing in patients with abnormal conditions. Moreover, since 98.7% of identified sensor domains are uncharacterized, importance ranking can be used to prioritize sensors to determine what ecosystem function they may be sensing. Furthermore, these new predictive sensors can function as targets for novel sensor engineering with applications in biotechnology, ecosystem maintenance, and medicine.IMPORTANCEMicrobes infect, colonize, and proliferate due to their ability to sense and respond quickly to their surroundings. In this research, we extract the sensory proteins from a diverse range of environmental, engineered, and host-associated metagenomes. We trained machine learning classifiers using sensors as features such that it is possible to predict the ecosystem for a metagenome from its sensor profile. We use the optimized model’s feature importance to identify the most impactful and predictive sensors in different environments. We next use the sensor profile from human gut metagenomes to classify their disease states and explore which sensors can explain differences between diseases. The sensors most predictive of environmental labels here, most of which correspond to uncharacterized proteins, are a useful starting point for the discovery of important environment signals and the development of possible diagnostic interventions.
first_indexed 2024-03-08T12:02:53Z
format Article
id doaj.art-a625ab5429de49c5ae5cd1501a9ffcc7
institution Directory Open Access Journal
issn 2379-5077
language English
last_indexed 2024-03-08T12:02:53Z
publishDate 2024-01-01
publisher American Society for Microbiology
record_format Article
series mSystems
spelling doaj.art-a625ab5429de49c5ae5cd1501a9ffcc72024-01-23T14:00:49ZengAmerican Society for MicrobiologymSystems2379-50772024-01-019110.1128/msystems.00026-23A bacterial sensor taxonomy across earth ecosystems for machine learning applicationsHelen Park0Marcin P. Joachimiak1Sean P. Jungbluth2Ziming Yang3William J. Riehl4R. Shane Canon5Adam P. Arkin6Paramvir S. Dehal7Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua-Peking Center for Life Sciences, Tsinghua University, Beijing, ChinaEnvironmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California, USAEnvironmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California, USAComputational Science Initiative, Brookhaven National Laboratory, Upton, New York, USAEnvironmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California, USAEnvironmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California, USAEnvironmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California, USAEnvironmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California, USAABSTRACTMicrobial communities have evolved to colonize all ecosystems of the planet, from the deep sea to the human gut. Microbes survive by sensing, responding, and adapting to immediate environmental cues. This process is driven by signal transduction proteins such as histidine kinases, which use their sensing domains to bind or otherwise detect environmental cues and “transduce” signals to adjust internal processes. We hypothesized that an ecosystem’s unique stimuli leave a sensor “fingerprint,” able to identify and shed insight on ecosystem conditions. To test this, we collected 20,712 publicly available metagenomes from Host-associated, Environmental, and Engineered ecosystems across the globe. We extracted and clustered the collection’s nearly 18M unique sensory domains into 113,712 similar groupings with MMseqs2. We built gradient-boosted decision tree machine learning models and found we could classify the ecosystem type (accuracy: 87%) and predict the levels of different physical parameters (R2 score: 83%) using the sensor cluster abundance as features. Feature importance enables identification of the most predictive sensors to differentiate between ecosystems which can lead to mechanistic interpretations if the sensor domains are well annotated. To demonstrate this, a machine learning model was trained to predict patient’s disease state and used to identify domains related to oxygen sensing present in a healthy gut but missing in patients with abnormal conditions. Moreover, since 98.7% of identified sensor domains are uncharacterized, importance ranking can be used to prioritize sensors to determine what ecosystem function they may be sensing. Furthermore, these new predictive sensors can function as targets for novel sensor engineering with applications in biotechnology, ecosystem maintenance, and medicine.IMPORTANCEMicrobes infect, colonize, and proliferate due to their ability to sense and respond quickly to their surroundings. In this research, we extract the sensory proteins from a diverse range of environmental, engineered, and host-associated metagenomes. We trained machine learning classifiers using sensors as features such that it is possible to predict the ecosystem for a metagenome from its sensor profile. We use the optimized model’s feature importance to identify the most impactful and predictive sensors in different environments. We next use the sensor profile from human gut metagenomes to classify their disease states and explore which sensors can explain differences between diseases. The sensors most predictive of environmental labels here, most of which correspond to uncharacterized proteins, are a useful starting point for the discovery of important environment signals and the development of possible diagnostic interventions.https://journals.asm.org/doi/10.1128/msystems.00026-23metagenomicsmachine learninghistidine kinasesensory transduction processeshuman microbiomefeature importance
spellingShingle Helen Park
Marcin P. Joachimiak
Sean P. Jungbluth
Ziming Yang
William J. Riehl
R. Shane Canon
Adam P. Arkin
Paramvir S. Dehal
A bacterial sensor taxonomy across earth ecosystems for machine learning applications
mSystems
metagenomics
machine learning
histidine kinase
sensory transduction processes
human microbiome
feature importance
title A bacterial sensor taxonomy across earth ecosystems for machine learning applications
title_full A bacterial sensor taxonomy across earth ecosystems for machine learning applications
title_fullStr A bacterial sensor taxonomy across earth ecosystems for machine learning applications
title_full_unstemmed A bacterial sensor taxonomy across earth ecosystems for machine learning applications
title_short A bacterial sensor taxonomy across earth ecosystems for machine learning applications
title_sort bacterial sensor taxonomy across earth ecosystems for machine learning applications
topic metagenomics
machine learning
histidine kinase
sensory transduction processes
human microbiome
feature importance
url https://journals.asm.org/doi/10.1128/msystems.00026-23
work_keys_str_mv AT helenpark abacterialsensortaxonomyacrossearthecosystemsformachinelearningapplications
AT marcinpjoachimiak abacterialsensortaxonomyacrossearthecosystemsformachinelearningapplications
AT seanpjungbluth abacterialsensortaxonomyacrossearthecosystemsformachinelearningapplications
AT zimingyang abacterialsensortaxonomyacrossearthecosystemsformachinelearningapplications
AT williamjriehl abacterialsensortaxonomyacrossearthecosystemsformachinelearningapplications
AT rshanecanon abacterialsensortaxonomyacrossearthecosystemsformachinelearningapplications
AT adamparkin abacterialsensortaxonomyacrossearthecosystemsformachinelearningapplications
AT paramvirsdehal abacterialsensortaxonomyacrossearthecosystemsformachinelearningapplications
AT helenpark bacterialsensortaxonomyacrossearthecosystemsformachinelearningapplications
AT marcinpjoachimiak bacterialsensortaxonomyacrossearthecosystemsformachinelearningapplications
AT seanpjungbluth bacterialsensortaxonomyacrossearthecosystemsformachinelearningapplications
AT zimingyang bacterialsensortaxonomyacrossearthecosystemsformachinelearningapplications
AT williamjriehl bacterialsensortaxonomyacrossearthecosystemsformachinelearningapplications
AT rshanecanon bacterialsensortaxonomyacrossearthecosystemsformachinelearningapplications
AT adamparkin bacterialsensortaxonomyacrossearthecosystemsformachinelearningapplications
AT paramvirsdehal bacterialsensortaxonomyacrossearthecosystemsformachinelearningapplications