Automated incorporation of pairwise dependency in transcription factor binding site prediction using dinucleotide weight tensors.

Gene regulatory networks are ultimately encoded by the sequence-specific binding of (TFs) to short DNA segments. Although it is customary to represent the binding specificity of a TF by a position-specific weight matrix (PSWM), which assumes each position within a site contributes independently to t...

Full description

Bibliographic Details
Main Authors: Saeed Omidi, Mihaela Zavolan, Mikhail Pachkov, Jeremie Breda, Severin Berger, Erik van Nimwegen
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2017-07-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1005176
_version_ 1818735073566916608
author Saeed Omidi
Mihaela Zavolan
Mikhail Pachkov
Jeremie Breda
Severin Berger
Erik van Nimwegen
author_facet Saeed Omidi
Mihaela Zavolan
Mikhail Pachkov
Jeremie Breda
Severin Berger
Erik van Nimwegen
author_sort Saeed Omidi
collection DOAJ
description Gene regulatory networks are ultimately encoded by the sequence-specific binding of (TFs) to short DNA segments. Although it is customary to represent the binding specificity of a TF by a position-specific weight matrix (PSWM), which assumes each position within a site contributes independently to the overall binding affinity, evidence has been accumulating that there can be significant dependencies between positions. Unfortunately, methodological challenges have so far hindered the development of a practical and generally-accepted extension of the PSWM model. On the one hand, simple models that only consider dependencies between nearest-neighbor positions are easy to use in practice, but fail to account for the distal dependencies that are observed in the data. On the other hand, models that allow for arbitrary dependencies are prone to overfitting, requiring regularization schemes that are difficult to use in practice for non-experts. Here we present a new regulatory motif model, called dinucleotide weight tensor (DWT), that incorporates arbitrary pairwise dependencies between positions in binding sites, rigorously from first principles, and free from tunable parameters. We demonstrate the power of the method on a large set of ChIP-seq data-sets, showing that DWTs outperform both PSWMs and motif models that only incorporate nearest-neighbor dependencies. We also demonstrate that DWTs outperform two previously proposed methods. Finally, we show that DWTs inferred from ChIP-seq data also outperform PSWMs on HT-SELEX data for the same TF, suggesting that DWTs capture inherent biophysical properties of the interactions between the DNA binding domains of TFs and their binding sites. We make a suite of DWT tools available at dwt.unibas.ch, that allow users to automatically perform 'motif finding', i.e. the inference of DWT motifs from a set of sequences, binding site prediction with DWTs, and visualization of DWT 'dilogo' motifs.
first_indexed 2024-12-18T00:15:27Z
format Article
id doaj.art-aa3dd7e6d78d40c9a5134f79b59fa0ed
institution Directory Open Access Journal
issn 1553-734X
1553-7358
language English
last_indexed 2024-12-18T00:15:27Z
publishDate 2017-07-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj.art-aa3dd7e6d78d40c9a5134f79b59fa0ed2022-12-21T21:27:32ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582017-07-01137e100517610.1371/journal.pcbi.1005176Automated incorporation of pairwise dependency in transcription factor binding site prediction using dinucleotide weight tensors.Saeed OmidiMihaela ZavolanMikhail PachkovJeremie BredaSeverin BergerErik van NimwegenGene regulatory networks are ultimately encoded by the sequence-specific binding of (TFs) to short DNA segments. Although it is customary to represent the binding specificity of a TF by a position-specific weight matrix (PSWM), which assumes each position within a site contributes independently to the overall binding affinity, evidence has been accumulating that there can be significant dependencies between positions. Unfortunately, methodological challenges have so far hindered the development of a practical and generally-accepted extension of the PSWM model. On the one hand, simple models that only consider dependencies between nearest-neighbor positions are easy to use in practice, but fail to account for the distal dependencies that are observed in the data. On the other hand, models that allow for arbitrary dependencies are prone to overfitting, requiring regularization schemes that are difficult to use in practice for non-experts. Here we present a new regulatory motif model, called dinucleotide weight tensor (DWT), that incorporates arbitrary pairwise dependencies between positions in binding sites, rigorously from first principles, and free from tunable parameters. We demonstrate the power of the method on a large set of ChIP-seq data-sets, showing that DWTs outperform both PSWMs and motif models that only incorporate nearest-neighbor dependencies. We also demonstrate that DWTs outperform two previously proposed methods. Finally, we show that DWTs inferred from ChIP-seq data also outperform PSWMs on HT-SELEX data for the same TF, suggesting that DWTs capture inherent biophysical properties of the interactions between the DNA binding domains of TFs and their binding sites. We make a suite of DWT tools available at dwt.unibas.ch, that allow users to automatically perform 'motif finding', i.e. the inference of DWT motifs from a set of sequences, binding site prediction with DWTs, and visualization of DWT 'dilogo' motifs.https://doi.org/10.1371/journal.pcbi.1005176
spellingShingle Saeed Omidi
Mihaela Zavolan
Mikhail Pachkov
Jeremie Breda
Severin Berger
Erik van Nimwegen
Automated incorporation of pairwise dependency in transcription factor binding site prediction using dinucleotide weight tensors.
PLoS Computational Biology
title Automated incorporation of pairwise dependency in transcription factor binding site prediction using dinucleotide weight tensors.
title_full Automated incorporation of pairwise dependency in transcription factor binding site prediction using dinucleotide weight tensors.
title_fullStr Automated incorporation of pairwise dependency in transcription factor binding site prediction using dinucleotide weight tensors.
title_full_unstemmed Automated incorporation of pairwise dependency in transcription factor binding site prediction using dinucleotide weight tensors.
title_short Automated incorporation of pairwise dependency in transcription factor binding site prediction using dinucleotide weight tensors.
title_sort automated incorporation of pairwise dependency in transcription factor binding site prediction using dinucleotide weight tensors
url https://doi.org/10.1371/journal.pcbi.1005176
work_keys_str_mv AT saeedomidi automatedincorporationofpairwisedependencyintranscriptionfactorbindingsitepredictionusingdinucleotideweighttensors
AT mihaelazavolan automatedincorporationofpairwisedependencyintranscriptionfactorbindingsitepredictionusingdinucleotideweighttensors
AT mikhailpachkov automatedincorporationofpairwisedependencyintranscriptionfactorbindingsitepredictionusingdinucleotideweighttensors
AT jeremiebreda automatedincorporationofpairwisedependencyintranscriptionfactorbindingsitepredictionusingdinucleotideweighttensors
AT severinberger automatedincorporationofpairwisedependencyintranscriptionfactorbindingsitepredictionusingdinucleotideweighttensors
AT erikvannimwegen automatedincorporationofpairwisedependencyintranscriptionfactorbindingsitepredictionusingdinucleotideweighttensors