Multi-Omics integration can be used to rescue metabolic information for some of the dark region of the Pseudomonas putida proteome

Abstract In every omics experiment, genes or their products are identified for which even state of the art tools are unable to assign a function. In the biotechnology chassis organism Pseudomonas putida, these proteins of unknown function make up 14% of the proteome. This missing information can bia...

Full description

Bibliographic Details
Main Authors: Steven Tavis, Robert L. Hettich
Format: Article
Language:English
Published: BMC 2024-03-01
Series:BMC Genomics
Subjects:
Online Access:https://doi.org/10.1186/s12864-024-10082-y
_version_ 1797259531049238528
author Steven Tavis
Robert L. Hettich
author_facet Steven Tavis
Robert L. Hettich
author_sort Steven Tavis
collection DOAJ
description Abstract In every omics experiment, genes or their products are identified for which even state of the art tools are unable to assign a function. In the biotechnology chassis organism Pseudomonas putida, these proteins of unknown function make up 14% of the proteome. This missing information can bias analyses since these proteins can carry out functions which impact the engineering of organisms. As a consequence of predicting protein function across all organisms, function prediction tools generally fail to use all of the types of data available for any specific organism, including protein and transcript expression information. Additionally, the release of Alphafold predictions for all Uniprot proteins provides a novel opportunity for leveraging structural information. We constructed a bespoke machine learning model to predict the function of recalcitrant proteins of unknown function in Pseudomonas putida based on these sources of data, which annotated 1079 terms to 213 proteins. Among the predicted functions supplied by the model, we found evidence for a significant overrepresentation of nitrogen metabolism and macromolecule processing proteins. These findings were corroborated by manual analyses of selected proteins which identified, among others, a functionally unannotated operon that likely encodes a branch of the shikimate pathway.
first_indexed 2024-04-24T23:10:54Z
format Article
id doaj.art-d606eb770ec5438dad8e3fb21607c319
institution Directory Open Access Journal
issn 1471-2164
language English
last_indexed 2024-04-24T23:10:54Z
publishDate 2024-03-01
publisher BMC
record_format Article
series BMC Genomics
spelling doaj.art-d606eb770ec5438dad8e3fb21607c3192024-03-17T12:16:32ZengBMCBMC Genomics1471-21642024-03-0125111510.1186/s12864-024-10082-yMulti-Omics integration can be used to rescue metabolic information for some of the dark region of the Pseudomonas putida proteomeSteven Tavis0Robert L. Hettich1Genome Science and Technology Graduate Program, University of Tennessee KnoxvilleBiosciences Division, Oak Ridge National LaboratoryAbstract In every omics experiment, genes or their products are identified for which even state of the art tools are unable to assign a function. In the biotechnology chassis organism Pseudomonas putida, these proteins of unknown function make up 14% of the proteome. This missing information can bias analyses since these proteins can carry out functions which impact the engineering of organisms. As a consequence of predicting protein function across all organisms, function prediction tools generally fail to use all of the types of data available for any specific organism, including protein and transcript expression information. Additionally, the release of Alphafold predictions for all Uniprot proteins provides a novel opportunity for leveraging structural information. We constructed a bespoke machine learning model to predict the function of recalcitrant proteins of unknown function in Pseudomonas putida based on these sources of data, which annotated 1079 terms to 213 proteins. Among the predicted functions supplied by the model, we found evidence for a significant overrepresentation of nitrogen metabolism and macromolecule processing proteins. These findings were corroborated by manual analyses of selected proteins which identified, among others, a functionally unannotated operon that likely encodes a branch of the shikimate pathway.https://doi.org/10.1186/s12864-024-10082-yMulti-omics integrationProteins of unknown functionMachine learningGene ontologyPseudomonas putidaFunction prediction
spellingShingle Steven Tavis
Robert L. Hettich
Multi-Omics integration can be used to rescue metabolic information for some of the dark region of the Pseudomonas putida proteome
BMC Genomics
Multi-omics integration
Proteins of unknown function
Machine learning
Gene ontology
Pseudomonas putida
Function prediction
title Multi-Omics integration can be used to rescue metabolic information for some of the dark region of the Pseudomonas putida proteome
title_full Multi-Omics integration can be used to rescue metabolic information for some of the dark region of the Pseudomonas putida proteome
title_fullStr Multi-Omics integration can be used to rescue metabolic information for some of the dark region of the Pseudomonas putida proteome
title_full_unstemmed Multi-Omics integration can be used to rescue metabolic information for some of the dark region of the Pseudomonas putida proteome
title_short Multi-Omics integration can be used to rescue metabolic information for some of the dark region of the Pseudomonas putida proteome
title_sort multi omics integration can be used to rescue metabolic information for some of the dark region of the pseudomonas putida proteome
topic Multi-omics integration
Proteins of unknown function
Machine learning
Gene ontology
Pseudomonas putida
Function prediction
url https://doi.org/10.1186/s12864-024-10082-y
work_keys_str_mv AT steventavis multiomicsintegrationcanbeusedtorescuemetabolicinformationforsomeofthedarkregionofthepseudomonasputidaproteome
AT robertlhettich multiomicsintegrationcanbeusedtorescuemetabolicinformationforsomeofthedarkregionofthepseudomonasputidaproteome