Position Weight Matrix or Acyclic Probabilistic Finite Automaton: Which model to use? A decision rule inferred for the prediction of transcription factor binding sites
Abstract Prediction of transcription factor binding sites (TFBS) is an example of application of Bioinformatics where DNA molecules are represented as sequences of A, C, G and T symbols. The most used model in this problem is Position Weight Matrix (PWM). Notwithstanding the advantage of being simpl...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Sociedade Brasileira de Genética
2024-01-01
|
Series: | Genetics and Molecular Biology |
Subjects: | |
Online Access: | http://www.scielo.br/scielo.php?script=sci_arttext&pid=S1415-47572023000700802&tlng=en |
_version_ | 1827376258822963200 |
---|---|
author | Guilherme Miura Lavezzo Marcelo de Souza Lauretto Luiz Paulo Moura Andrioli Ariane Machado-Lima |
author_facet | Guilherme Miura Lavezzo Marcelo de Souza Lauretto Luiz Paulo Moura Andrioli Ariane Machado-Lima |
author_sort | Guilherme Miura Lavezzo |
collection | DOAJ |
description | Abstract Prediction of transcription factor binding sites (TFBS) is an example of application of Bioinformatics where DNA molecules are represented as sequences of A, C, G and T symbols. The most used model in this problem is Position Weight Matrix (PWM). Notwithstanding the advantage of being simple, PWMs cannot capture dependency between nucleotide positions, which may affect prediction performance. Acyclic Probabilistic Finite Automata (APFA) is an alternative model able to accommodate position dependencies. However, APFA is a more complex model, which means more parameters have to be learned. In this paper, we propose an innovative method to identify when position dependencies influence preference for PWMs or APFAs. This implied using position dependency features extracted from 1106 sets of TFBS to infer a decision tree able to predict which is the best model - PWM or APFA - for a given set of TFBSs. According to our results, as few as three pinpointed features are able to choose the best model, providing a balance of performance (average precision) and model simplicity. |
first_indexed | 2024-03-08T12:07:16Z |
format | Article |
id | doaj.art-ef02c2ab757b418ab2cfc1ff13d7616a |
institution | Directory Open Access Journal |
issn | 1678-4685 |
language | English |
last_indexed | 2024-03-08T12:07:16Z |
publishDate | 2024-01-01 |
publisher | Sociedade Brasileira de Genética |
record_format | Article |
series | Genetics and Molecular Biology |
spelling | doaj.art-ef02c2ab757b418ab2cfc1ff13d7616a2024-01-23T07:35:47ZengSociedade Brasileira de GenéticaGenetics and Molecular Biology1678-46852024-01-0146410.1590/1678-4685-gmb-2023-0048Position Weight Matrix or Acyclic Probabilistic Finite Automaton: Which model to use? A decision rule inferred for the prediction of transcription factor binding sitesGuilherme Miura Lavezzohttps://orcid.org/0000-0002-8709-354XMarcelo de Souza LaurettoLuiz Paulo Moura AndrioliAriane Machado-LimaAbstract Prediction of transcription factor binding sites (TFBS) is an example of application of Bioinformatics where DNA molecules are represented as sequences of A, C, G and T symbols. The most used model in this problem is Position Weight Matrix (PWM). Notwithstanding the advantage of being simple, PWMs cannot capture dependency between nucleotide positions, which may affect prediction performance. Acyclic Probabilistic Finite Automata (APFA) is an alternative model able to accommodate position dependencies. However, APFA is a more complex model, which means more parameters have to be learned. In this paper, we propose an innovative method to identify when position dependencies influence preference for PWMs or APFAs. This implied using position dependency features extracted from 1106 sets of TFBS to infer a decision tree able to predict which is the best model - PWM or APFA - for a given set of TFBSs. According to our results, as few as three pinpointed features are able to choose the best model, providing a balance of performance (average precision) and model simplicity.http://www.scielo.br/scielo.php?script=sci_arttext&pid=S1415-47572023000700802&tlng=enTranscription factor binding siteposition weight matrixChIP-seqposition dependencymodel comparison |
spellingShingle | Guilherme Miura Lavezzo Marcelo de Souza Lauretto Luiz Paulo Moura Andrioli Ariane Machado-Lima Position Weight Matrix or Acyclic Probabilistic Finite Automaton: Which model to use? A decision rule inferred for the prediction of transcription factor binding sites Genetics and Molecular Biology Transcription factor binding site position weight matrix ChIP-seq position dependency model comparison |
title | Position Weight Matrix or Acyclic Probabilistic Finite Automaton: Which model to use? A decision rule inferred for the prediction of transcription factor binding sites |
title_full | Position Weight Matrix or Acyclic Probabilistic Finite Automaton: Which model to use? A decision rule inferred for the prediction of transcription factor binding sites |
title_fullStr | Position Weight Matrix or Acyclic Probabilistic Finite Automaton: Which model to use? A decision rule inferred for the prediction of transcription factor binding sites |
title_full_unstemmed | Position Weight Matrix or Acyclic Probabilistic Finite Automaton: Which model to use? A decision rule inferred for the prediction of transcription factor binding sites |
title_short | Position Weight Matrix or Acyclic Probabilistic Finite Automaton: Which model to use? A decision rule inferred for the prediction of transcription factor binding sites |
title_sort | position weight matrix or acyclic probabilistic finite automaton which model to use a decision rule inferred for the prediction of transcription factor binding sites |
topic | Transcription factor binding site position weight matrix ChIP-seq position dependency model comparison |
url | http://www.scielo.br/scielo.php?script=sci_arttext&pid=S1415-47572023000700802&tlng=en |
work_keys_str_mv | AT guilhermemiuralavezzo positionweightmatrixoracyclicprobabilisticfiniteautomatonwhichmodeltouseadecisionruleinferredforthepredictionoftranscriptionfactorbindingsites AT marcelodesouzalauretto positionweightmatrixoracyclicprobabilisticfiniteautomatonwhichmodeltouseadecisionruleinferredforthepredictionoftranscriptionfactorbindingsites AT luizpaulomouraandrioli positionweightmatrixoracyclicprobabilisticfiniteautomatonwhichmodeltouseadecisionruleinferredforthepredictionoftranscriptionfactorbindingsites AT arianemachadolima positionweightmatrixoracyclicprobabilisticfiniteautomatonwhichmodeltouseadecisionruleinferredforthepredictionoftranscriptionfactorbindingsites |