Machine learning and statistics shape a novel path in archaeal promoter annotation
Abstract Background Archaea are a vast and unexplored domain. Bioinformatic techniques might enlighten the path to a higher quality genome annotation in varied organisms. Promoter sequences of archaea have the action of a plethora of proteins upon it. The conservation found in a structural level of...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2022-05-01
|
Series: | BMC Bioinformatics |
Online Access: | https://doi.org/10.1186/s12859-022-04714-x |
_version_ | 1817988355923640320 |
---|---|
author | Gustavo Sganzerla Martinez Ernesto Pérez-Rueda Sharmilee Sarkar Aditya Kumar Scheila de Ávila e Silva |
author_facet | Gustavo Sganzerla Martinez Ernesto Pérez-Rueda Sharmilee Sarkar Aditya Kumar Scheila de Ávila e Silva |
author_sort | Gustavo Sganzerla Martinez |
collection | DOAJ |
description | Abstract Background Archaea are a vast and unexplored domain. Bioinformatic techniques might enlighten the path to a higher quality genome annotation in varied organisms. Promoter sequences of archaea have the action of a plethora of proteins upon it. The conservation found in a structural level of the binding site of proteins such as TBP, TFB, and TFE aids RNAP-DNA stabilization and makes the archaeal promoter prone to be explored by statistical and machine learning techniques. Results and discussions In this study, experimentally verified promoter sequences of the organisms Haloferax volcanii, Sulfolobus solfataricus, and Thermococcus kodakarensis were converted into DNA duplex stability attributes (i.e. numerical variables) and were classified through Artificial Neural Networks and an in-house statistical method of classification, being tested with three forms of controls. The recognition of these promoters enabled its use to validate unannotated promoter sequences in other organisms. As a result, the binding site of basal transcription factors was located through a DNA duplex stability codification. Additionally, the classification presented satisfactory results (above 90%) among varied levels of control. Concluding remarks The classification models were employed to perform genomic annotation into the archaea Aciduliprofundum boonei and Thermofilum pendens, from which potential promoters have been identified and uploaded into public repositories. |
first_indexed | 2024-04-14T00:33:15Z |
format | Article |
id | doaj.art-9a55fc5174c7482cb711da615ddf31f9 |
institution | Directory Open Access Journal |
issn | 1471-2105 |
language | English |
last_indexed | 2024-04-14T00:33:15Z |
publishDate | 2022-05-01 |
publisher | BMC |
record_format | Article |
series | BMC Bioinformatics |
spelling | doaj.art-9a55fc5174c7482cb711da615ddf31f92022-12-22T02:22:28ZengBMCBMC Bioinformatics1471-21052022-05-0123111410.1186/s12859-022-04714-xMachine learning and statistics shape a novel path in archaeal promoter annotationGustavo Sganzerla Martinez0Ernesto Pérez-Rueda1Sharmilee Sarkar2Aditya Kumar3Scheila de Ávila e Silva4Programa de Pós-Graduação em Biotecnologia, Universidade de Caxias do SulInstituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Unidad Académica de YucatánDepartment of Molecular Biology and Biotechnology, Tezpur UniversityDepartment of Molecular Biology and Biotechnology, Tezpur UniversityPrograma de Pós-Graduação em Biotecnologia, Universidade de Caxias do SulAbstract Background Archaea are a vast and unexplored domain. Bioinformatic techniques might enlighten the path to a higher quality genome annotation in varied organisms. Promoter sequences of archaea have the action of a plethora of proteins upon it. The conservation found in a structural level of the binding site of proteins such as TBP, TFB, and TFE aids RNAP-DNA stabilization and makes the archaeal promoter prone to be explored by statistical and machine learning techniques. Results and discussions In this study, experimentally verified promoter sequences of the organisms Haloferax volcanii, Sulfolobus solfataricus, and Thermococcus kodakarensis were converted into DNA duplex stability attributes (i.e. numerical variables) and were classified through Artificial Neural Networks and an in-house statistical method of classification, being tested with three forms of controls. The recognition of these promoters enabled its use to validate unannotated promoter sequences in other organisms. As a result, the binding site of basal transcription factors was located through a DNA duplex stability codification. Additionally, the classification presented satisfactory results (above 90%) among varied levels of control. Concluding remarks The classification models were employed to perform genomic annotation into the archaea Aciduliprofundum boonei and Thermofilum pendens, from which potential promoters have been identified and uploaded into public repositories.https://doi.org/10.1186/s12859-022-04714-x |
spellingShingle | Gustavo Sganzerla Martinez Ernesto Pérez-Rueda Sharmilee Sarkar Aditya Kumar Scheila de Ávila e Silva Machine learning and statistics shape a novel path in archaeal promoter annotation BMC Bioinformatics |
title | Machine learning and statistics shape a novel path in archaeal promoter annotation |
title_full | Machine learning and statistics shape a novel path in archaeal promoter annotation |
title_fullStr | Machine learning and statistics shape a novel path in archaeal promoter annotation |
title_full_unstemmed | Machine learning and statistics shape a novel path in archaeal promoter annotation |
title_short | Machine learning and statistics shape a novel path in archaeal promoter annotation |
title_sort | machine learning and statistics shape a novel path in archaeal promoter annotation |
url | https://doi.org/10.1186/s12859-022-04714-x |
work_keys_str_mv | AT gustavosganzerlamartinez machinelearningandstatisticsshapeanovelpathinarchaealpromoterannotation AT ernestoperezrueda machinelearningandstatisticsshapeanovelpathinarchaealpromoterannotation AT sharmileesarkar machinelearningandstatisticsshapeanovelpathinarchaealpromoterannotation AT adityakumar machinelearningandstatisticsshapeanovelpathinarchaealpromoterannotation AT scheiladeavilaesilva machinelearningandstatisticsshapeanovelpathinarchaealpromoterannotation |