Machine learning and statistics shape a novel path in archaeal promoter annotation

Abstract Background Archaea are a vast and unexplored domain. Bioinformatic techniques might enlighten the path to a higher quality genome annotation in varied organisms. Promoter sequences of archaea have the action of a plethora of proteins upon it. The conservation found in a structural level of...

Full description

Bibliographic Details
Main Authors: Gustavo Sganzerla Martinez, Ernesto Pérez-Rueda, Sharmilee Sarkar, Aditya Kumar, Scheila de Ávila e Silva
Format: Article
Language:English
Published: BMC 2022-05-01
Series:BMC Bioinformatics
Online Access:https://doi.org/10.1186/s12859-022-04714-x
_version_ 1817988355923640320
author Gustavo Sganzerla Martinez
Ernesto Pérez-Rueda
Sharmilee Sarkar
Aditya Kumar
Scheila de Ávila e Silva
author_facet Gustavo Sganzerla Martinez
Ernesto Pérez-Rueda
Sharmilee Sarkar
Aditya Kumar
Scheila de Ávila e Silva
author_sort Gustavo Sganzerla Martinez
collection DOAJ
description Abstract Background Archaea are a vast and unexplored domain. Bioinformatic techniques might enlighten the path to a higher quality genome annotation in varied organisms. Promoter sequences of archaea have the action of a plethora of proteins upon it. The conservation found in a structural level of the binding site of proteins such as TBP, TFB, and TFE aids RNAP-DNA stabilization and makes the archaeal promoter prone to be explored by statistical and machine learning techniques. Results and discussions In this study, experimentally verified promoter sequences of the organisms Haloferax volcanii, Sulfolobus solfataricus, and Thermococcus kodakarensis were converted into DNA duplex stability attributes (i.e. numerical variables) and were classified through Artificial Neural Networks and an in-house statistical method of classification, being tested with three forms of controls. The recognition of these promoters enabled its use to validate unannotated promoter sequences in other organisms. As a result, the binding site of basal transcription factors was located through a DNA duplex stability codification. Additionally, the classification presented satisfactory results (above 90%) among varied levels of control. Concluding remarks The classification models were employed to perform genomic annotation into the archaea Aciduliprofundum boonei and Thermofilum pendens, from which potential promoters have been identified and uploaded into public repositories.
first_indexed 2024-04-14T00:33:15Z
format Article
id doaj.art-9a55fc5174c7482cb711da615ddf31f9
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-04-14T00:33:15Z
publishDate 2022-05-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-9a55fc5174c7482cb711da615ddf31f92022-12-22T02:22:28ZengBMCBMC Bioinformatics1471-21052022-05-0123111410.1186/s12859-022-04714-xMachine learning and statistics shape a novel path in archaeal promoter annotationGustavo Sganzerla Martinez0Ernesto Pérez-Rueda1Sharmilee Sarkar2Aditya Kumar3Scheila de Ávila e Silva4Programa de Pós-Graduação em Biotecnologia, Universidade de Caxias do SulInstituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Unidad Académica de YucatánDepartment of Molecular Biology and Biotechnology, Tezpur UniversityDepartment of Molecular Biology and Biotechnology, Tezpur UniversityPrograma de Pós-Graduação em Biotecnologia, Universidade de Caxias do SulAbstract Background Archaea are a vast and unexplored domain. Bioinformatic techniques might enlighten the path to a higher quality genome annotation in varied organisms. Promoter sequences of archaea have the action of a plethora of proteins upon it. The conservation found in a structural level of the binding site of proteins such as TBP, TFB, and TFE aids RNAP-DNA stabilization and makes the archaeal promoter prone to be explored by statistical and machine learning techniques. Results and discussions In this study, experimentally verified promoter sequences of the organisms Haloferax volcanii, Sulfolobus solfataricus, and Thermococcus kodakarensis were converted into DNA duplex stability attributes (i.e. numerical variables) and were classified through Artificial Neural Networks and an in-house statistical method of classification, being tested with three forms of controls. The recognition of these promoters enabled its use to validate unannotated promoter sequences in other organisms. As a result, the binding site of basal transcription factors was located through a DNA duplex stability codification. Additionally, the classification presented satisfactory results (above 90%) among varied levels of control. Concluding remarks The classification models were employed to perform genomic annotation into the archaea Aciduliprofundum boonei and Thermofilum pendens, from which potential promoters have been identified and uploaded into public repositories.https://doi.org/10.1186/s12859-022-04714-x
spellingShingle Gustavo Sganzerla Martinez
Ernesto Pérez-Rueda
Sharmilee Sarkar
Aditya Kumar
Scheila de Ávila e Silva
Machine learning and statistics shape a novel path in archaeal promoter annotation
BMC Bioinformatics
title Machine learning and statistics shape a novel path in archaeal promoter annotation
title_full Machine learning and statistics shape a novel path in archaeal promoter annotation
title_fullStr Machine learning and statistics shape a novel path in archaeal promoter annotation
title_full_unstemmed Machine learning and statistics shape a novel path in archaeal promoter annotation
title_short Machine learning and statistics shape a novel path in archaeal promoter annotation
title_sort machine learning and statistics shape a novel path in archaeal promoter annotation
url https://doi.org/10.1186/s12859-022-04714-x
work_keys_str_mv AT gustavosganzerlamartinez machinelearningandstatisticsshapeanovelpathinarchaealpromoterannotation
AT ernestoperezrueda machinelearningandstatisticsshapeanovelpathinarchaealpromoterannotation
AT sharmileesarkar machinelearningandstatisticsshapeanovelpathinarchaealpromoterannotation
AT adityakumar machinelearningandstatisticsshapeanovelpathinarchaealpromoterannotation
AT scheiladeavilaesilva machinelearningandstatisticsshapeanovelpathinarchaealpromoterannotation