Position Weight Matrix or Acyclic Probabilistic Finite Automaton: Which model to use? A decision rule inferred for the prediction of transcription factor binding sites

Abstract Prediction of transcription factor binding sites (TFBS) is an example of application of Bioinformatics where DNA molecules are represented as sequences of A, C, G and T symbols. The most used model in this problem is Position Weight Matrix (PWM). Notwithstanding the advantage of being simpl...

Full description

Bibliographic Details
Main Authors: Guilherme Miura Lavezzo, Marcelo de Souza Lauretto, Luiz Paulo Moura Andrioli, Ariane Machado-Lima
Format: Article
Language:English
Published: Sociedade Brasileira de Genética 2024-01-01
Series:Genetics and Molecular Biology
Subjects:
Online Access:http://www.scielo.br/scielo.php?script=sci_arttext&pid=S1415-47572023000700802&tlng=en
_version_ 1827376258822963200
author Guilherme Miura Lavezzo
Marcelo de Souza Lauretto
Luiz Paulo Moura Andrioli
Ariane Machado-Lima
author_facet Guilherme Miura Lavezzo
Marcelo de Souza Lauretto
Luiz Paulo Moura Andrioli
Ariane Machado-Lima
author_sort Guilherme Miura Lavezzo
collection DOAJ
description Abstract Prediction of transcription factor binding sites (TFBS) is an example of application of Bioinformatics where DNA molecules are represented as sequences of A, C, G and T symbols. The most used model in this problem is Position Weight Matrix (PWM). Notwithstanding the advantage of being simple, PWMs cannot capture dependency between nucleotide positions, which may affect prediction performance. Acyclic Probabilistic Finite Automata (APFA) is an alternative model able to accommodate position dependencies. However, APFA is a more complex model, which means more parameters have to be learned. In this paper, we propose an innovative method to identify when position dependencies influence preference for PWMs or APFAs. This implied using position dependency features extracted from 1106 sets of TFBS to infer a decision tree able to predict which is the best model - PWM or APFA - for a given set of TFBSs. According to our results, as few as three pinpointed features are able to choose the best model, providing a balance of performance (average precision) and model simplicity.
first_indexed 2024-03-08T12:07:16Z
format Article
id doaj.art-ef02c2ab757b418ab2cfc1ff13d7616a
institution Directory Open Access Journal
issn 1678-4685
language English
last_indexed 2024-03-08T12:07:16Z
publishDate 2024-01-01
publisher Sociedade Brasileira de Genética
record_format Article
series Genetics and Molecular Biology
spelling doaj.art-ef02c2ab757b418ab2cfc1ff13d7616a2024-01-23T07:35:47ZengSociedade Brasileira de GenéticaGenetics and Molecular Biology1678-46852024-01-0146410.1590/1678-4685-gmb-2023-0048Position Weight Matrix or Acyclic Probabilistic Finite Automaton: Which model to use? A decision rule inferred for the prediction of transcription factor binding sitesGuilherme Miura Lavezzohttps://orcid.org/0000-0002-8709-354XMarcelo de Souza LaurettoLuiz Paulo Moura AndrioliAriane Machado-LimaAbstract Prediction of transcription factor binding sites (TFBS) is an example of application of Bioinformatics where DNA molecules are represented as sequences of A, C, G and T symbols. The most used model in this problem is Position Weight Matrix (PWM). Notwithstanding the advantage of being simple, PWMs cannot capture dependency between nucleotide positions, which may affect prediction performance. Acyclic Probabilistic Finite Automata (APFA) is an alternative model able to accommodate position dependencies. However, APFA is a more complex model, which means more parameters have to be learned. In this paper, we propose an innovative method to identify when position dependencies influence preference for PWMs or APFAs. This implied using position dependency features extracted from 1106 sets of TFBS to infer a decision tree able to predict which is the best model - PWM or APFA - for a given set of TFBSs. According to our results, as few as three pinpointed features are able to choose the best model, providing a balance of performance (average precision) and model simplicity.http://www.scielo.br/scielo.php?script=sci_arttext&pid=S1415-47572023000700802&tlng=enTranscription factor binding siteposition weight matrixChIP-seqposition dependencymodel comparison
spellingShingle Guilherme Miura Lavezzo
Marcelo de Souza Lauretto
Luiz Paulo Moura Andrioli
Ariane Machado-Lima
Position Weight Matrix or Acyclic Probabilistic Finite Automaton: Which model to use? A decision rule inferred for the prediction of transcription factor binding sites
Genetics and Molecular Biology
Transcription factor binding site
position weight matrix
ChIP-seq
position dependency
model comparison
title Position Weight Matrix or Acyclic Probabilistic Finite Automaton: Which model to use? A decision rule inferred for the prediction of transcription factor binding sites
title_full Position Weight Matrix or Acyclic Probabilistic Finite Automaton: Which model to use? A decision rule inferred for the prediction of transcription factor binding sites
title_fullStr Position Weight Matrix or Acyclic Probabilistic Finite Automaton: Which model to use? A decision rule inferred for the prediction of transcription factor binding sites
title_full_unstemmed Position Weight Matrix or Acyclic Probabilistic Finite Automaton: Which model to use? A decision rule inferred for the prediction of transcription factor binding sites
title_short Position Weight Matrix or Acyclic Probabilistic Finite Automaton: Which model to use? A decision rule inferred for the prediction of transcription factor binding sites
title_sort position weight matrix or acyclic probabilistic finite automaton which model to use a decision rule inferred for the prediction of transcription factor binding sites
topic Transcription factor binding site
position weight matrix
ChIP-seq
position dependency
model comparison
url http://www.scielo.br/scielo.php?script=sci_arttext&pid=S1415-47572023000700802&tlng=en
work_keys_str_mv AT guilhermemiuralavezzo positionweightmatrixoracyclicprobabilisticfiniteautomatonwhichmodeltouseadecisionruleinferredforthepredictionoftranscriptionfactorbindingsites
AT marcelodesouzalauretto positionweightmatrixoracyclicprobabilisticfiniteautomatonwhichmodeltouseadecisionruleinferredforthepredictionoftranscriptionfactorbindingsites
AT luizpaulomouraandrioli positionweightmatrixoracyclicprobabilisticfiniteautomatonwhichmodeltouseadecisionruleinferredforthepredictionoftranscriptionfactorbindingsites
AT arianemachadolima positionweightmatrixoracyclicprobabilisticfiniteautomatonwhichmodeltouseadecisionruleinferredforthepredictionoftranscriptionfactorbindingsites