Nucleotide patterns aiding in prediction of eukaryotic promoters.
Computational analysis of promoters is hindered by the complexity of their architecture. In less studied genomes with complex organization, false positive promoter predictions are common. Accurate identification of transcription start sites and core promoter regions remains an unsolved problem. In t...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2017-01-01
|
Series: | PLoS ONE |
Online Access: | http://europepmc.org/articles/PMC5687710?pdf=render |
_version_ | 1818305646976565248 |
---|---|
author | Martin Triska Victor Solovyev Ancha Baranova Alexander Kel Tatiana V Tatarinova |
author_facet | Martin Triska Victor Solovyev Ancha Baranova Alexander Kel Tatiana V Tatarinova |
author_sort | Martin Triska |
collection | DOAJ |
description | Computational analysis of promoters is hindered by the complexity of their architecture. In less studied genomes with complex organization, false positive promoter predictions are common. Accurate identification of transcription start sites and core promoter regions remains an unsolved problem. In this paper, we present a comprehensive analysis of genomic features associated with promoters and show that probabilistic integrative algorithms-driven models allow accurate classification of DNA sequence into "promoters" and "non-promoters" even in absence of the full-length cDNA sequences. These models may be built upon the maps of the distributions of sequence polymorphisms, RNA sequencing reads on genomic DNA, methylated nucleotides, transcription factor binding sites, as well as relative frequencies of nucleotides and their combinations. Positional clustering of binding sites shows that the cells of Oryza sativa utilize three distinct classes of transcription factors: those that bind preferentially to the [-500,0] region (188 "promoter-specific" transcription factors), those that bind preferentially to the [0,500] region (282 "5' UTR-specific" TFs), and 207 of the "promiscuous" transcription factors with little or no location preference with respect to TSS. For the most informative motifs, their positional preferences are conserved between dicots and monocots. |
first_indexed | 2024-12-13T06:29:54Z |
format | Article |
id | doaj.art-bbe76b5996ce4e6dbc001c19e9ccd154 |
institution | Directory Open Access Journal |
issn | 1932-6203 |
language | English |
last_indexed | 2024-12-13T06:29:54Z |
publishDate | 2017-01-01 |
publisher | Public Library of Science (PLoS) |
record_format | Article |
series | PLoS ONE |
spelling | doaj.art-bbe76b5996ce4e6dbc001c19e9ccd1542022-12-21T23:56:38ZengPublic Library of Science (PLoS)PLoS ONE1932-62032017-01-011211e018724310.1371/journal.pone.0187243Nucleotide patterns aiding in prediction of eukaryotic promoters.Martin TriskaVictor SolovyevAncha BaranovaAlexander KelTatiana V TatarinovaComputational analysis of promoters is hindered by the complexity of their architecture. In less studied genomes with complex organization, false positive promoter predictions are common. Accurate identification of transcription start sites and core promoter regions remains an unsolved problem. In this paper, we present a comprehensive analysis of genomic features associated with promoters and show that probabilistic integrative algorithms-driven models allow accurate classification of DNA sequence into "promoters" and "non-promoters" even in absence of the full-length cDNA sequences. These models may be built upon the maps of the distributions of sequence polymorphisms, RNA sequencing reads on genomic DNA, methylated nucleotides, transcription factor binding sites, as well as relative frequencies of nucleotides and their combinations. Positional clustering of binding sites shows that the cells of Oryza sativa utilize three distinct classes of transcription factors: those that bind preferentially to the [-500,0] region (188 "promoter-specific" transcription factors), those that bind preferentially to the [0,500] region (282 "5' UTR-specific" TFs), and 207 of the "promiscuous" transcription factors with little or no location preference with respect to TSS. For the most informative motifs, their positional preferences are conserved between dicots and monocots.http://europepmc.org/articles/PMC5687710?pdf=render |
spellingShingle | Martin Triska Victor Solovyev Ancha Baranova Alexander Kel Tatiana V Tatarinova Nucleotide patterns aiding in prediction of eukaryotic promoters. PLoS ONE |
title | Nucleotide patterns aiding in prediction of eukaryotic promoters. |
title_full | Nucleotide patterns aiding in prediction of eukaryotic promoters. |
title_fullStr | Nucleotide patterns aiding in prediction of eukaryotic promoters. |
title_full_unstemmed | Nucleotide patterns aiding in prediction of eukaryotic promoters. |
title_short | Nucleotide patterns aiding in prediction of eukaryotic promoters. |
title_sort | nucleotide patterns aiding in prediction of eukaryotic promoters |
url | http://europepmc.org/articles/PMC5687710?pdf=render |
work_keys_str_mv | AT martintriska nucleotidepatternsaidinginpredictionofeukaryoticpromoters AT victorsolovyev nucleotidepatternsaidinginpredictionofeukaryoticpromoters AT anchabaranova nucleotidepatternsaidinginpredictionofeukaryoticpromoters AT alexanderkel nucleotidepatternsaidinginpredictionofeukaryoticpromoters AT tatianavtatarinova nucleotidepatternsaidinginpredictionofeukaryoticpromoters |