Profiling a Community-Specific Function Landscape for Bacterial Peptides Through Protein-Level Meta-Assembly and Machine Learning

Small proteins, encoded by small open reading frames, are only beginning to emerge with the current advancement of omics technology and bioinformatics. There is increasing evidence that small proteins play roles in diverse critical biological functions, such as adjusting cellular metabolism, regulat...

Full description

Bibliographic Details
Main Authors: Mitra Vajjala, Brady Johnson, Lauren Kasparek, Michael Leuze, Qiuming Yao
Format: Article
Language:English
Published: Frontiers Media S.A. 2022-07-01
Series:Frontiers in Genetics
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fgene.2022.935351/full
_version_ 1817969160570798080
author Mitra Vajjala
Brady Johnson
Lauren Kasparek
Michael Leuze
Qiuming Yao
author_facet Mitra Vajjala
Brady Johnson
Lauren Kasparek
Michael Leuze
Qiuming Yao
author_sort Mitra Vajjala
collection DOAJ
description Small proteins, encoded by small open reading frames, are only beginning to emerge with the current advancement of omics technology and bioinformatics. There is increasing evidence that small proteins play roles in diverse critical biological functions, such as adjusting cellular metabolism, regulating other protein activities, controlling cell cycles, and affecting disease physiology. In prokaryotes such as bacteria, the small proteins are largely unexplored for their sequence space and functional groups. For most bacterial species from a natural community, the sample cannot be easily isolated or cultured, and the bacterial peptides must be better characterized in a metagenomic manner. The bacterial peptides identified from metagenomic samples can not only enrich the pool of small proteins but can also reveal the community-specific microbe ecology information from a small protein perspective. In this study, metaBP (Bacterial Peptides for metagenomic sample) has been developed as a comprehensive toolkit to explore the small protein universe from metagenomic samples. It takes raw sequencing reads as input, performs protein-level meta-assembly, and computes bacterial peptide homolog groups with sample-specific mutations. The metaBP also integrates general protein annotation tools as well as our small protein-specific machine learning module metaBP-ML to construct a full landscape for bacterial peptides. The metaBP-ML shows advantages for discovering functions of bacterial peptides in a microbial community and increases the yields of annotations by up to five folds. The metaBP toolkit demonstrates its novelty in adopting the protein-level assembly to discover small proteins, integrating protein-clustering tool in a new and flexible environment of RBiotools, and presenting the first-time small protein landscape by metaBP-ML. Taken together, metaBP (and metaBP-ML) can profile functional bacterial peptides from metagenomic samples with potential diverse mutations, in order to depict a unique landscape of small proteins from a microbial community.
first_indexed 2024-04-13T20:16:56Z
format Article
id doaj.art-f8d7437afba248e5ababbbbfb9524170
institution Directory Open Access Journal
issn 1664-8021
language English
last_indexed 2024-04-13T20:16:56Z
publishDate 2022-07-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Genetics
spelling doaj.art-f8d7437afba248e5ababbbbfb95241702022-12-22T02:31:40ZengFrontiers Media S.A.Frontiers in Genetics1664-80212022-07-011310.3389/fgene.2022.935351935351Profiling a Community-Specific Function Landscape for Bacterial Peptides Through Protein-Level Meta-Assembly and Machine LearningMitra Vajjala0Brady Johnson1Lauren Kasparek2Michael Leuze3Qiuming Yao4School of Computing, University of Nebraska-Lincoln, Lincoln, NE, United StatesSchool of Computing, University of Nebraska-Lincoln, Lincoln, NE, United StatesSchool of Computing, University of Nebraska-Lincoln, Lincoln, NE, United StatesNashville Biosciences, Nashville, TN, United StatesSchool of Computing, University of Nebraska-Lincoln, Lincoln, NE, United StatesSmall proteins, encoded by small open reading frames, are only beginning to emerge with the current advancement of omics technology and bioinformatics. There is increasing evidence that small proteins play roles in diverse critical biological functions, such as adjusting cellular metabolism, regulating other protein activities, controlling cell cycles, and affecting disease physiology. In prokaryotes such as bacteria, the small proteins are largely unexplored for their sequence space and functional groups. For most bacterial species from a natural community, the sample cannot be easily isolated or cultured, and the bacterial peptides must be better characterized in a metagenomic manner. The bacterial peptides identified from metagenomic samples can not only enrich the pool of small proteins but can also reveal the community-specific microbe ecology information from a small protein perspective. In this study, metaBP (Bacterial Peptides for metagenomic sample) has been developed as a comprehensive toolkit to explore the small protein universe from metagenomic samples. It takes raw sequencing reads as input, performs protein-level meta-assembly, and computes bacterial peptide homolog groups with sample-specific mutations. The metaBP also integrates general protein annotation tools as well as our small protein-specific machine learning module metaBP-ML to construct a full landscape for bacterial peptides. The metaBP-ML shows advantages for discovering functions of bacterial peptides in a microbial community and increases the yields of annotations by up to five folds. The metaBP toolkit demonstrates its novelty in adopting the protein-level assembly to discover small proteins, integrating protein-clustering tool in a new and flexible environment of RBiotools, and presenting the first-time small protein landscape by metaBP-ML. Taken together, metaBP (and metaBP-ML) can profile functional bacterial peptides from metagenomic samples with potential diverse mutations, in order to depict a unique landscape of small proteins from a microbial community.https://www.frontiersin.org/articles/10.3389/fgene.2022.935351/fullbacterial peptidemachine learningmetagenomicsprotein annotationprotein clustering
spellingShingle Mitra Vajjala
Brady Johnson
Lauren Kasparek
Michael Leuze
Qiuming Yao
Profiling a Community-Specific Function Landscape for Bacterial Peptides Through Protein-Level Meta-Assembly and Machine Learning
Frontiers in Genetics
bacterial peptide
machine learning
metagenomics
protein annotation
protein clustering
title Profiling a Community-Specific Function Landscape for Bacterial Peptides Through Protein-Level Meta-Assembly and Machine Learning
title_full Profiling a Community-Specific Function Landscape for Bacterial Peptides Through Protein-Level Meta-Assembly and Machine Learning
title_fullStr Profiling a Community-Specific Function Landscape for Bacterial Peptides Through Protein-Level Meta-Assembly and Machine Learning
title_full_unstemmed Profiling a Community-Specific Function Landscape for Bacterial Peptides Through Protein-Level Meta-Assembly and Machine Learning
title_short Profiling a Community-Specific Function Landscape for Bacterial Peptides Through Protein-Level Meta-Assembly and Machine Learning
title_sort profiling a community specific function landscape for bacterial peptides through protein level meta assembly and machine learning
topic bacterial peptide
machine learning
metagenomics
protein annotation
protein clustering
url https://www.frontiersin.org/articles/10.3389/fgene.2022.935351/full
work_keys_str_mv AT mitravajjala profilingacommunityspecificfunctionlandscapeforbacterialpeptidesthroughproteinlevelmetaassemblyandmachinelearning
AT bradyjohnson profilingacommunityspecificfunctionlandscapeforbacterialpeptidesthroughproteinlevelmetaassemblyandmachinelearning
AT laurenkasparek profilingacommunityspecificfunctionlandscapeforbacterialpeptidesthroughproteinlevelmetaassemblyandmachinelearning
AT michaelleuze profilingacommunityspecificfunctionlandscapeforbacterialpeptidesthroughproteinlevelmetaassemblyandmachinelearning
AT qiumingyao profilingacommunityspecificfunctionlandscapeforbacterialpeptidesthroughproteinlevelmetaassemblyandmachinelearning