Unraveling the hidden universe of small proteins in bacterial genomes

Abstract Identification of small open reading frames (smORFs) encoding small proteins (≤ 100 amino acids; SEPs) is a challenge in the fields of genome annotation and protein discovery. Here, by combining a novel bioinformatics tool (RanSEPs) with “‐omics” approaches, we were able to describe 109 bac...

Full description

Bibliographic Details
Main Authors: Samuel Miravet‐Verde, Tony Ferrar, Guadalupe Espadas‐García, Rocco Mazzolini, Anas Gharrab, Eduard Sabido, Luis Serrano, Maria Lluch‐Senar
Format: Article
Language:English
Published: Springer Nature 2019-02-01
Series:Molecular Systems Biology
Subjects:
Online Access:https://doi.org/10.15252/msb.20188290
_version_ 1797284695817322496
author Samuel Miravet‐Verde
Tony Ferrar
Guadalupe Espadas‐García
Rocco Mazzolini
Anas Gharrab
Eduard Sabido
Luis Serrano
Maria Lluch‐Senar
author_facet Samuel Miravet‐Verde
Tony Ferrar
Guadalupe Espadas‐García
Rocco Mazzolini
Anas Gharrab
Eduard Sabido
Luis Serrano
Maria Lluch‐Senar
author_sort Samuel Miravet‐Verde
collection DOAJ
description Abstract Identification of small open reading frames (smORFs) encoding small proteins (≤ 100 amino acids; SEPs) is a challenge in the fields of genome annotation and protein discovery. Here, by combining a novel bioinformatics tool (RanSEPs) with “‐omics” approaches, we were able to describe 109 bacterial small ORFomes. Predictions were first validated by performing an exhaustive search of SEPs present in Mycoplasma pneumoniae proteome via mass spectrometry, which illustrated the limitations of shotgun approaches. Then, RanSEPs predictions were validated and compared with other tools using proteomic datasets from different bacterial species and SEPs from the literature. We found that up to 16 ± 9% of proteins in an organism could be classified as SEPs. Integration of RanSEPs predictions with transcriptomics data showed that some annotated non‐coding RNAs could in fact encode for SEPs. A functional study of SEPs highlighted an enrichment in the membrane, translation, metabolism, and nucleotide‐binding categories. Additionally, 9.7% of the SEPs included a N‐terminus predicted signal peptide. We envision RanSEPs as a tool to unmask the hidden universe of small bacterial proteins.
first_indexed 2024-03-07T17:52:28Z
format Article
id doaj.art-1d0bc80f694d49b2a6fc9d864463ce06
institution Directory Open Access Journal
issn 1744-4292
language English
last_indexed 2024-03-07T17:52:28Z
publishDate 2019-02-01
publisher Springer Nature
record_format Article
series Molecular Systems Biology
spelling doaj.art-1d0bc80f694d49b2a6fc9d864463ce062024-03-02T13:39:17ZengSpringer NatureMolecular Systems Biology1744-42922019-02-01152n/an/a10.15252/msb.20188290Unraveling the hidden universe of small proteins in bacterial genomesSamuel Miravet‐Verde0Tony Ferrar1Guadalupe Espadas‐García2Rocco Mazzolini3Anas Gharrab4Eduard Sabido5Luis Serrano6Maria Lluch‐Senar7EMBL/CRG Systems Biology Research Unit Centre for Genomic Regulation (CRG) The Barcelona Institute of Science and Technology Barcelona SpainEMBL/CRG Systems Biology Research Unit Centre for Genomic Regulation (CRG) The Barcelona Institute of Science and Technology Barcelona SpainCentre for Genomic Regulation (CRG) The Barcelona Institute of Science and Technology Barcelona SpainEMBL/CRG Systems Biology Research Unit Centre for Genomic Regulation (CRG) The Barcelona Institute of Science and Technology Barcelona SpainEMBL/CRG Systems Biology Research Unit Centre for Genomic Regulation (CRG) The Barcelona Institute of Science and Technology Barcelona SpainCentre for Genomic Regulation (CRG) The Barcelona Institute of Science and Technology Barcelona SpainEMBL/CRG Systems Biology Research Unit Centre for Genomic Regulation (CRG) The Barcelona Institute of Science and Technology Barcelona SpainEMBL/CRG Systems Biology Research Unit Centre for Genomic Regulation (CRG) The Barcelona Institute of Science and Technology Barcelona SpainAbstract Identification of small open reading frames (smORFs) encoding small proteins (≤ 100 amino acids; SEPs) is a challenge in the fields of genome annotation and protein discovery. Here, by combining a novel bioinformatics tool (RanSEPs) with “‐omics” approaches, we were able to describe 109 bacterial small ORFomes. Predictions were first validated by performing an exhaustive search of SEPs present in Mycoplasma pneumoniae proteome via mass spectrometry, which illustrated the limitations of shotgun approaches. Then, RanSEPs predictions were validated and compared with other tools using proteomic datasets from different bacterial species and SEPs from the literature. We found that up to 16 ± 9% of proteins in an organism could be classified as SEPs. Integration of RanSEPs predictions with transcriptomics data showed that some annotated non‐coding RNAs could in fact encode for SEPs. A functional study of SEPs highlighted an enrichment in the membrane, translation, metabolism, and nucleotide‐binding categories. Additionally, 9.7% of the SEPs included a N‐terminus predicted signal peptide. We envision RanSEPs as a tool to unmask the hidden universe of small bacterial proteins.https://doi.org/10.15252/msb.20188290mass spectroscopymycoplasmasprotein predictionrandom forest classifiersmall proteins
spellingShingle Samuel Miravet‐Verde
Tony Ferrar
Guadalupe Espadas‐García
Rocco Mazzolini
Anas Gharrab
Eduard Sabido
Luis Serrano
Maria Lluch‐Senar
Unraveling the hidden universe of small proteins in bacterial genomes
Molecular Systems Biology
mass spectroscopy
mycoplasmas
protein prediction
random forest classifier
small proteins
title Unraveling the hidden universe of small proteins in bacterial genomes
title_full Unraveling the hidden universe of small proteins in bacterial genomes
title_fullStr Unraveling the hidden universe of small proteins in bacterial genomes
title_full_unstemmed Unraveling the hidden universe of small proteins in bacterial genomes
title_short Unraveling the hidden universe of small proteins in bacterial genomes
title_sort unraveling the hidden universe of small proteins in bacterial genomes
topic mass spectroscopy
mycoplasmas
protein prediction
random forest classifier
small proteins
url https://doi.org/10.15252/msb.20188290
work_keys_str_mv AT samuelmiravetverde unravelingthehiddenuniverseofsmallproteinsinbacterialgenomes
AT tonyferrar unravelingthehiddenuniverseofsmallproteinsinbacterialgenomes
AT guadalupeespadasgarcia unravelingthehiddenuniverseofsmallproteinsinbacterialgenomes
AT roccomazzolini unravelingthehiddenuniverseofsmallproteinsinbacterialgenomes
AT anasgharrab unravelingthehiddenuniverseofsmallproteinsinbacterialgenomes
AT eduardsabido unravelingthehiddenuniverseofsmallproteinsinbacterialgenomes
AT luisserrano unravelingthehiddenuniverseofsmallproteinsinbacterialgenomes
AT marialluchsenar unravelingthehiddenuniverseofsmallproteinsinbacterialgenomes