Motif models proposing independent and interdependent impacts of nucleotides are related to high and low affinity transcription factor binding sites in Arabidopsis

Position weight matrix (PWM) is the traditional motif model representing the transcription factor (TF) binding sites. It proposes that the positions contribute independently to TFs binding affinity, although this hypothesis does not fit the data perfectly. This explains why PWM hits are missing in a...

Full description

Bibliographic Details
Main Authors: Anton V. Tsukanov, Victoria V. Mironova, Victor G. Levitsky
Format: Article
Language:English
Published: Frontiers Media S.A. 2022-07-01
Series:Frontiers in Plant Science
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fpls.2022.938545/full
_version_ 1818042154985979904
author Anton V. Tsukanov
Victoria V. Mironova
Victoria V. Mironova
Victor G. Levitsky
Victor G. Levitsky
author_facet Anton V. Tsukanov
Victoria V. Mironova
Victoria V. Mironova
Victor G. Levitsky
Victor G. Levitsky
author_sort Anton V. Tsukanov
collection DOAJ
description Position weight matrix (PWM) is the traditional motif model representing the transcription factor (TF) binding sites. It proposes that the positions contribute independently to TFs binding affinity, although this hypothesis does not fit the data perfectly. This explains why PWM hits are missing in a substantial fraction of ChIP-seq peaks. To study various modes of the direct binding of plant TFs, we compiled the benchmark collection of 111 ChIP-seq datasets for Arabidopsis thaliana, and applied the traditional PWM, and two alternative motif models BaMM and SiteGA, proposing the dependencies of the positions. The variation in the stringency of the recognition thresholds for the models proposed that the hits of PWM, BaMM, and SiteGA models are associated with the sites of high/medium, any, and low affinity, respectively. At the medium recognition threshold, about 60% of ChIP-seq peaks contain PWM hits consisting of conserved core consensuses, while BaMM and SiteGA provide hits for an additional 15% of peaks in which a weaker core consensus is compensated through intra-motif dependencies. The presence/absence of these dependencies in the motifs of alternative/traditional models was confirmed by the dependency logo DepLogo visualizing the position-wise partitioning of the alignments of predicted sites. We exemplify the detailed analysis of ChIP-seq profiles for plant TFs CCA1, MYC2, and SEP3. Gene ontology (GO) enrichment analysis revealed that among the three motif models, the SiteGA had the highest portions of genes with the significantly enriched GO terms among all predicted genes. We showed that both alternative motif models provide for traditional PWM greater extensions in predicted sites for TFs MYC2/SEP3 with condition/tissue specific functions, compared to those for TF CCA1 with housekeeping functions. Overall, the combined application of standard and alternative motif models is beneficial to detect various modes of the direct TF-DNA interactions in the maximal portion of ChIP-seq loci.
first_indexed 2024-12-10T08:41:49Z
format Article
id doaj.art-a0edc582289d439c839a186eb9b0e21d
institution Directory Open Access Journal
issn 1664-462X
language English
last_indexed 2024-12-10T08:41:49Z
publishDate 2022-07-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Plant Science
spelling doaj.art-a0edc582289d439c839a186eb9b0e21d2022-12-22T01:55:49ZengFrontiers Media S.A.Frontiers in Plant Science1664-462X2022-07-011310.3389/fpls.2022.938545938545Motif models proposing independent and interdependent impacts of nucleotides are related to high and low affinity transcription factor binding sites in ArabidopsisAnton V. Tsukanov0Victoria V. Mironova1Victoria V. Mironova2Victor G. Levitsky3Victor G. Levitsky4Department of Systems Biology, Institute of Cytology and Genetics, Novosibirsk, RussiaDepartment of Systems Biology, Institute of Cytology and Genetics, Novosibirsk, RussiaDepartment of Plant Systems Physiology, Radboud Institute for Biological and Environmental Sciences (RIBES), Radboud University, Nijmegen, NetherlandsDepartment of Systems Biology, Institute of Cytology and Genetics, Novosibirsk, RussiaDepartment of Natural Science, Novosibirsk State University, Novosibirsk, RussiaPosition weight matrix (PWM) is the traditional motif model representing the transcription factor (TF) binding sites. It proposes that the positions contribute independently to TFs binding affinity, although this hypothesis does not fit the data perfectly. This explains why PWM hits are missing in a substantial fraction of ChIP-seq peaks. To study various modes of the direct binding of plant TFs, we compiled the benchmark collection of 111 ChIP-seq datasets for Arabidopsis thaliana, and applied the traditional PWM, and two alternative motif models BaMM and SiteGA, proposing the dependencies of the positions. The variation in the stringency of the recognition thresholds for the models proposed that the hits of PWM, BaMM, and SiteGA models are associated with the sites of high/medium, any, and low affinity, respectively. At the medium recognition threshold, about 60% of ChIP-seq peaks contain PWM hits consisting of conserved core consensuses, while BaMM and SiteGA provide hits for an additional 15% of peaks in which a weaker core consensus is compensated through intra-motif dependencies. The presence/absence of these dependencies in the motifs of alternative/traditional models was confirmed by the dependency logo DepLogo visualizing the position-wise partitioning of the alignments of predicted sites. We exemplify the detailed analysis of ChIP-seq profiles for plant TFs CCA1, MYC2, and SEP3. Gene ontology (GO) enrichment analysis revealed that among the three motif models, the SiteGA had the highest portions of genes with the significantly enriched GO terms among all predicted genes. We showed that both alternative motif models provide for traditional PWM greater extensions in predicted sites for TFs MYC2/SEP3 with condition/tissue specific functions, compared to those for TF CCA1 with housekeeping functions. Overall, the combined application of standard and alternative motif models is beneficial to detect various modes of the direct TF-DNA interactions in the maximal portion of ChIP-seq loci.https://www.frontiersin.org/articles/10.3389/fpls.2022.938545/fullde novo motif searchheterogeneity of transcription factor binding siteshigh and low affinity of transcription factor binding sitesstandard and alternative motif recognition modelsChIP-seq data analysis
spellingShingle Anton V. Tsukanov
Victoria V. Mironova
Victoria V. Mironova
Victor G. Levitsky
Victor G. Levitsky
Motif models proposing independent and interdependent impacts of nucleotides are related to high and low affinity transcription factor binding sites in Arabidopsis
Frontiers in Plant Science
de novo motif search
heterogeneity of transcription factor binding sites
high and low affinity of transcription factor binding sites
standard and alternative motif recognition models
ChIP-seq data analysis
title Motif models proposing independent and interdependent impacts of nucleotides are related to high and low affinity transcription factor binding sites in Arabidopsis
title_full Motif models proposing independent and interdependent impacts of nucleotides are related to high and low affinity transcription factor binding sites in Arabidopsis
title_fullStr Motif models proposing independent and interdependent impacts of nucleotides are related to high and low affinity transcription factor binding sites in Arabidopsis
title_full_unstemmed Motif models proposing independent and interdependent impacts of nucleotides are related to high and low affinity transcription factor binding sites in Arabidopsis
title_short Motif models proposing independent and interdependent impacts of nucleotides are related to high and low affinity transcription factor binding sites in Arabidopsis
title_sort motif models proposing independent and interdependent impacts of nucleotides are related to high and low affinity transcription factor binding sites in arabidopsis
topic de novo motif search
heterogeneity of transcription factor binding sites
high and low affinity of transcription factor binding sites
standard and alternative motif recognition models
ChIP-seq data analysis
url https://www.frontiersin.org/articles/10.3389/fpls.2022.938545/full
work_keys_str_mv AT antonvtsukanov motifmodelsproposingindependentandinterdependentimpactsofnucleotidesarerelatedtohighandlowaffinitytranscriptionfactorbindingsitesinarabidopsis
AT victoriavmironova motifmodelsproposingindependentandinterdependentimpactsofnucleotidesarerelatedtohighandlowaffinitytranscriptionfactorbindingsitesinarabidopsis
AT victoriavmironova motifmodelsproposingindependentandinterdependentimpactsofnucleotidesarerelatedtohighandlowaffinitytranscriptionfactorbindingsitesinarabidopsis
AT victorglevitsky motifmodelsproposingindependentandinterdependentimpactsofnucleotidesarerelatedtohighandlowaffinitytranscriptionfactorbindingsitesinarabidopsis
AT victorglevitsky motifmodelsproposingindependentandinterdependentimpactsofnucleotidesarerelatedtohighandlowaffinitytranscriptionfactorbindingsitesinarabidopsis