Bacillus subtilis promoter sequences data set for promoter prediction in Gram-positive bacteria

This paper presents a prediction of Bacillus subtilis promoters using a Support Vector Machine system. In the literature, there is a lack of information on Gram-positive bacterial promoter sequences compared to Gram-negative bacteria. Promoter sequence identification is essential for studying gene e...

Full description

Bibliographic Details
Main Authors: Rafael Vieira Coelho, Scheila de Avila e Silva, Sergio Echeverrigaray, Ana Paula Longaray Delamare
Format: Article
Language:English
Published: Elsevier 2018-08-01
Series:Data in Brief
Online Access:http://www.sciencedirect.com/science/article/pii/S2352340918305286
_version_ 1819131035536850944
author Rafael Vieira Coelho
Scheila de Avila e Silva
Sergio Echeverrigaray
Ana Paula Longaray Delamare
author_facet Rafael Vieira Coelho
Scheila de Avila e Silva
Sergio Echeverrigaray
Ana Paula Longaray Delamare
author_sort Rafael Vieira Coelho
collection DOAJ
description This paper presents a prediction of Bacillus subtilis promoters using a Support Vector Machine system. In the literature, there is a lack of information on Gram-positive bacterial promoter sequences compared to Gram-negative bacteria. Promoter sequence identification is essential for studying gene expression. Initially, we collected the B. subtilis genome sequence from the NCBI database, and promoters were identified by their sigma factors in the DBTBS database. We then grouped the promoters according to 15 factors in 2 domains, corresponding to sigma 54 and sigma 70 of Gram-negative bacteria. Based on these data we developed a script in Python to search for promoters in the B. subtilis genome. After processing the data, we obtained 767 promoter sequences for B. subtilis, most of which were recognized by sigma SigA. To validate the data we found, we developed a software package called BacSVM+, which receives promoters as input and returns the best combination of parameters in a LibSVM library to predict promoter regions in the bacteria used in the simulation. All data gathered as well as the BacSVM+ software is available for download at http://bacpp.bioinfoucs.com/rafael/Sigmas.zip. Keywords: Promoter sequences, Bacillus subtilis, SVM
first_indexed 2024-12-22T09:09:06Z
format Article
id doaj.art-bf5ef8aa1f2344cf847d96adc97eb7c7
institution Directory Open Access Journal
issn 2352-3409
language English
last_indexed 2024-12-22T09:09:06Z
publishDate 2018-08-01
publisher Elsevier
record_format Article
series Data in Brief
spelling doaj.art-bf5ef8aa1f2344cf847d96adc97eb7c72022-12-21T18:31:31ZengElsevierData in Brief2352-34092018-08-0119264270Bacillus subtilis promoter sequences data set for promoter prediction in Gram-positive bacteriaRafael Vieira Coelho0Scheila de Avila e Silva1Sergio Echeverrigaray2Ana Paula Longaray Delamare3Rio Grande do Sul Federal Institute of Education, Science and Technology (IFRS), Farroupilha Campus, Farroupilha, RS, Brazil; Corresponding author.Biotechnology Institute, University of Caxias do Sul (UCS), Caxias do Sul, RS, BrazilBiotechnology Institute, University of Caxias do Sul (UCS), Caxias do Sul, RS, BrazilBiotechnology Institute, University of Caxias do Sul (UCS), Caxias do Sul, RS, BrazilThis paper presents a prediction of Bacillus subtilis promoters using a Support Vector Machine system. In the literature, there is a lack of information on Gram-positive bacterial promoter sequences compared to Gram-negative bacteria. Promoter sequence identification is essential for studying gene expression. Initially, we collected the B. subtilis genome sequence from the NCBI database, and promoters were identified by their sigma factors in the DBTBS database. We then grouped the promoters according to 15 factors in 2 domains, corresponding to sigma 54 and sigma 70 of Gram-negative bacteria. Based on these data we developed a script in Python to search for promoters in the B. subtilis genome. After processing the data, we obtained 767 promoter sequences for B. subtilis, most of which were recognized by sigma SigA. To validate the data we found, we developed a software package called BacSVM+, which receives promoters as input and returns the best combination of parameters in a LibSVM library to predict promoter regions in the bacteria used in the simulation. All data gathered as well as the BacSVM+ software is available for download at http://bacpp.bioinfoucs.com/rafael/Sigmas.zip. Keywords: Promoter sequences, Bacillus subtilis, SVMhttp://www.sciencedirect.com/science/article/pii/S2352340918305286
spellingShingle Rafael Vieira Coelho
Scheila de Avila e Silva
Sergio Echeverrigaray
Ana Paula Longaray Delamare
Bacillus subtilis promoter sequences data set for promoter prediction in Gram-positive bacteria
Data in Brief
title Bacillus subtilis promoter sequences data set for promoter prediction in Gram-positive bacteria
title_full Bacillus subtilis promoter sequences data set for promoter prediction in Gram-positive bacteria
title_fullStr Bacillus subtilis promoter sequences data set for promoter prediction in Gram-positive bacteria
title_full_unstemmed Bacillus subtilis promoter sequences data set for promoter prediction in Gram-positive bacteria
title_short Bacillus subtilis promoter sequences data set for promoter prediction in Gram-positive bacteria
title_sort bacillus subtilis promoter sequences data set for promoter prediction in gram positive bacteria
url http://www.sciencedirect.com/science/article/pii/S2352340918305286
work_keys_str_mv AT rafaelvieiracoelho bacillussubtilispromotersequencesdatasetforpromoterpredictioningrampositivebacteria
AT scheiladeavilaesilva bacillussubtilispromotersequencesdatasetforpromoterpredictioningrampositivebacteria
AT sergioecheverrigaray bacillussubtilispromotersequencesdatasetforpromoterpredictioningrampositivebacteria
AT anapaulalongaraydelamare bacillussubtilispromotersequencesdatasetforpromoterpredictioningrampositivebacteria