A bacterial sensor taxonomy across earth ecosystems for machine learning applications
ABSTRACTMicrobial communities have evolved to colonize all ecosystems of the planet, from the deep sea to the human gut. Microbes survive by sensing, responding, and adapting to immediate environmental cues. This process is driven by signal transduction proteins such as histidine kinases, which use...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
American Society for Microbiology
2024-01-01
|
Series: | mSystems |
Subjects: | |
Online Access: | https://journals.asm.org/doi/10.1128/msystems.00026-23 |
_version_ | 1827376003219980288 |
---|---|
author | Helen Park Marcin P. Joachimiak Sean P. Jungbluth Ziming Yang William J. Riehl R. Shane Canon Adam P. Arkin Paramvir S. Dehal |
author_facet | Helen Park Marcin P. Joachimiak Sean P. Jungbluth Ziming Yang William J. Riehl R. Shane Canon Adam P. Arkin Paramvir S. Dehal |
author_sort | Helen Park |
collection | DOAJ |
description | ABSTRACTMicrobial communities have evolved to colonize all ecosystems of the planet, from the deep sea to the human gut. Microbes survive by sensing, responding, and adapting to immediate environmental cues. This process is driven by signal transduction proteins such as histidine kinases, which use their sensing domains to bind or otherwise detect environmental cues and “transduce” signals to adjust internal processes. We hypothesized that an ecosystem’s unique stimuli leave a sensor “fingerprint,” able to identify and shed insight on ecosystem conditions. To test this, we collected 20,712 publicly available metagenomes from Host-associated, Environmental, and Engineered ecosystems across the globe. We extracted and clustered the collection’s nearly 18M unique sensory domains into 113,712 similar groupings with MMseqs2. We built gradient-boosted decision tree machine learning models and found we could classify the ecosystem type (accuracy: 87%) and predict the levels of different physical parameters (R2 score: 83%) using the sensor cluster abundance as features. Feature importance enables identification of the most predictive sensors to differentiate between ecosystems which can lead to mechanistic interpretations if the sensor domains are well annotated. To demonstrate this, a machine learning model was trained to predict patient’s disease state and used to identify domains related to oxygen sensing present in a healthy gut but missing in patients with abnormal conditions. Moreover, since 98.7% of identified sensor domains are uncharacterized, importance ranking can be used to prioritize sensors to determine what ecosystem function they may be sensing. Furthermore, these new predictive sensors can function as targets for novel sensor engineering with applications in biotechnology, ecosystem maintenance, and medicine.IMPORTANCEMicrobes infect, colonize, and proliferate due to their ability to sense and respond quickly to their surroundings. In this research, we extract the sensory proteins from a diverse range of environmental, engineered, and host-associated metagenomes. We trained machine learning classifiers using sensors as features such that it is possible to predict the ecosystem for a metagenome from its sensor profile. We use the optimized model’s feature importance to identify the most impactful and predictive sensors in different environments. We next use the sensor profile from human gut metagenomes to classify their disease states and explore which sensors can explain differences between diseases. The sensors most predictive of environmental labels here, most of which correspond to uncharacterized proteins, are a useful starting point for the discovery of important environment signals and the development of possible diagnostic interventions. |
first_indexed | 2024-03-08T12:02:53Z |
format | Article |
id | doaj.art-a625ab5429de49c5ae5cd1501a9ffcc7 |
institution | Directory Open Access Journal |
issn | 2379-5077 |
language | English |
last_indexed | 2024-03-08T12:02:53Z |
publishDate | 2024-01-01 |
publisher | American Society for Microbiology |
record_format | Article |
series | mSystems |
spelling | doaj.art-a625ab5429de49c5ae5cd1501a9ffcc72024-01-23T14:00:49ZengAmerican Society for MicrobiologymSystems2379-50772024-01-019110.1128/msystems.00026-23A bacterial sensor taxonomy across earth ecosystems for machine learning applicationsHelen Park0Marcin P. Joachimiak1Sean P. Jungbluth2Ziming Yang3William J. Riehl4R. Shane Canon5Adam P. Arkin6Paramvir S. Dehal7Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua-Peking Center for Life Sciences, Tsinghua University, Beijing, ChinaEnvironmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California, USAEnvironmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California, USAComputational Science Initiative, Brookhaven National Laboratory, Upton, New York, USAEnvironmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California, USAEnvironmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California, USAEnvironmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California, USAEnvironmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California, USAABSTRACTMicrobial communities have evolved to colonize all ecosystems of the planet, from the deep sea to the human gut. Microbes survive by sensing, responding, and adapting to immediate environmental cues. This process is driven by signal transduction proteins such as histidine kinases, which use their sensing domains to bind or otherwise detect environmental cues and “transduce” signals to adjust internal processes. We hypothesized that an ecosystem’s unique stimuli leave a sensor “fingerprint,” able to identify and shed insight on ecosystem conditions. To test this, we collected 20,712 publicly available metagenomes from Host-associated, Environmental, and Engineered ecosystems across the globe. We extracted and clustered the collection’s nearly 18M unique sensory domains into 113,712 similar groupings with MMseqs2. We built gradient-boosted decision tree machine learning models and found we could classify the ecosystem type (accuracy: 87%) and predict the levels of different physical parameters (R2 score: 83%) using the sensor cluster abundance as features. Feature importance enables identification of the most predictive sensors to differentiate between ecosystems which can lead to mechanistic interpretations if the sensor domains are well annotated. To demonstrate this, a machine learning model was trained to predict patient’s disease state and used to identify domains related to oxygen sensing present in a healthy gut but missing in patients with abnormal conditions. Moreover, since 98.7% of identified sensor domains are uncharacterized, importance ranking can be used to prioritize sensors to determine what ecosystem function they may be sensing. Furthermore, these new predictive sensors can function as targets for novel sensor engineering with applications in biotechnology, ecosystem maintenance, and medicine.IMPORTANCEMicrobes infect, colonize, and proliferate due to their ability to sense and respond quickly to their surroundings. In this research, we extract the sensory proteins from a diverse range of environmental, engineered, and host-associated metagenomes. We trained machine learning classifiers using sensors as features such that it is possible to predict the ecosystem for a metagenome from its sensor profile. We use the optimized model’s feature importance to identify the most impactful and predictive sensors in different environments. We next use the sensor profile from human gut metagenomes to classify their disease states and explore which sensors can explain differences between diseases. The sensors most predictive of environmental labels here, most of which correspond to uncharacterized proteins, are a useful starting point for the discovery of important environment signals and the development of possible diagnostic interventions.https://journals.asm.org/doi/10.1128/msystems.00026-23metagenomicsmachine learninghistidine kinasesensory transduction processeshuman microbiomefeature importance |
spellingShingle | Helen Park Marcin P. Joachimiak Sean P. Jungbluth Ziming Yang William J. Riehl R. Shane Canon Adam P. Arkin Paramvir S. Dehal A bacterial sensor taxonomy across earth ecosystems for machine learning applications mSystems metagenomics machine learning histidine kinase sensory transduction processes human microbiome feature importance |
title | A bacterial sensor taxonomy across earth ecosystems for machine learning applications |
title_full | A bacterial sensor taxonomy across earth ecosystems for machine learning applications |
title_fullStr | A bacterial sensor taxonomy across earth ecosystems for machine learning applications |
title_full_unstemmed | A bacterial sensor taxonomy across earth ecosystems for machine learning applications |
title_short | A bacterial sensor taxonomy across earth ecosystems for machine learning applications |
title_sort | bacterial sensor taxonomy across earth ecosystems for machine learning applications |
topic | metagenomics machine learning histidine kinase sensory transduction processes human microbiome feature importance |
url | https://journals.asm.org/doi/10.1128/msystems.00026-23 |
work_keys_str_mv | AT helenpark abacterialsensortaxonomyacrossearthecosystemsformachinelearningapplications AT marcinpjoachimiak abacterialsensortaxonomyacrossearthecosystemsformachinelearningapplications AT seanpjungbluth abacterialsensortaxonomyacrossearthecosystemsformachinelearningapplications AT zimingyang abacterialsensortaxonomyacrossearthecosystemsformachinelearningapplications AT williamjriehl abacterialsensortaxonomyacrossearthecosystemsformachinelearningapplications AT rshanecanon abacterialsensortaxonomyacrossearthecosystemsformachinelearningapplications AT adamparkin abacterialsensortaxonomyacrossearthecosystemsformachinelearningapplications AT paramvirsdehal abacterialsensortaxonomyacrossearthecosystemsformachinelearningapplications AT helenpark bacterialsensortaxonomyacrossearthecosystemsformachinelearningapplications AT marcinpjoachimiak bacterialsensortaxonomyacrossearthecosystemsformachinelearningapplications AT seanpjungbluth bacterialsensortaxonomyacrossearthecosystemsformachinelearningapplications AT zimingyang bacterialsensortaxonomyacrossearthecosystemsformachinelearningapplications AT williamjriehl bacterialsensortaxonomyacrossearthecosystemsformachinelearningapplications AT rshanecanon bacterialsensortaxonomyacrossearthecosystemsformachinelearningapplications AT adamparkin bacterialsensortaxonomyacrossearthecosystemsformachinelearningapplications AT paramvirsdehal bacterialsensortaxonomyacrossearthecosystemsformachinelearningapplications |