Nucleotide patterns aiding in prediction of eukaryotic promoters.

Computational analysis of promoters is hindered by the complexity of their architecture. In less studied genomes with complex organization, false positive promoter predictions are common. Accurate identification of transcription start sites and core promoter regions remains an unsolved problem. In t...

Full description

Bibliographic Details
Main Authors: Martin Triska, Victor Solovyev, Ancha Baranova, Alexander Kel, Tatiana V Tatarinova
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2017-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC5687710?pdf=render
_version_ 1818305646976565248
author Martin Triska
Victor Solovyev
Ancha Baranova
Alexander Kel
Tatiana V Tatarinova
author_facet Martin Triska
Victor Solovyev
Ancha Baranova
Alexander Kel
Tatiana V Tatarinova
author_sort Martin Triska
collection DOAJ
description Computational analysis of promoters is hindered by the complexity of their architecture. In less studied genomes with complex organization, false positive promoter predictions are common. Accurate identification of transcription start sites and core promoter regions remains an unsolved problem. In this paper, we present a comprehensive analysis of genomic features associated with promoters and show that probabilistic integrative algorithms-driven models allow accurate classification of DNA sequence into "promoters" and "non-promoters" even in absence of the full-length cDNA sequences. These models may be built upon the maps of the distributions of sequence polymorphisms, RNA sequencing reads on genomic DNA, methylated nucleotides, transcription factor binding sites, as well as relative frequencies of nucleotides and their combinations. Positional clustering of binding sites shows that the cells of Oryza sativa utilize three distinct classes of transcription factors: those that bind preferentially to the [-500,0] region (188 "promoter-specific" transcription factors), those that bind preferentially to the [0,500] region (282 "5' UTR-specific" TFs), and 207 of the "promiscuous" transcription factors with little or no location preference with respect to TSS. For the most informative motifs, their positional preferences are conserved between dicots and monocots.
first_indexed 2024-12-13T06:29:54Z
format Article
id doaj.art-bbe76b5996ce4e6dbc001c19e9ccd154
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-12-13T06:29:54Z
publishDate 2017-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-bbe76b5996ce4e6dbc001c19e9ccd1542022-12-21T23:56:38ZengPublic Library of Science (PLoS)PLoS ONE1932-62032017-01-011211e018724310.1371/journal.pone.0187243Nucleotide patterns aiding in prediction of eukaryotic promoters.Martin TriskaVictor SolovyevAncha BaranovaAlexander KelTatiana V TatarinovaComputational analysis of promoters is hindered by the complexity of their architecture. In less studied genomes with complex organization, false positive promoter predictions are common. Accurate identification of transcription start sites and core promoter regions remains an unsolved problem. In this paper, we present a comprehensive analysis of genomic features associated with promoters and show that probabilistic integrative algorithms-driven models allow accurate classification of DNA sequence into "promoters" and "non-promoters" even in absence of the full-length cDNA sequences. These models may be built upon the maps of the distributions of sequence polymorphisms, RNA sequencing reads on genomic DNA, methylated nucleotides, transcription factor binding sites, as well as relative frequencies of nucleotides and their combinations. Positional clustering of binding sites shows that the cells of Oryza sativa utilize three distinct classes of transcription factors: those that bind preferentially to the [-500,0] region (188 "promoter-specific" transcription factors), those that bind preferentially to the [0,500] region (282 "5' UTR-specific" TFs), and 207 of the "promiscuous" transcription factors with little or no location preference with respect to TSS. For the most informative motifs, their positional preferences are conserved between dicots and monocots.http://europepmc.org/articles/PMC5687710?pdf=render
spellingShingle Martin Triska
Victor Solovyev
Ancha Baranova
Alexander Kel
Tatiana V Tatarinova
Nucleotide patterns aiding in prediction of eukaryotic promoters.
PLoS ONE
title Nucleotide patterns aiding in prediction of eukaryotic promoters.
title_full Nucleotide patterns aiding in prediction of eukaryotic promoters.
title_fullStr Nucleotide patterns aiding in prediction of eukaryotic promoters.
title_full_unstemmed Nucleotide patterns aiding in prediction of eukaryotic promoters.
title_short Nucleotide patterns aiding in prediction of eukaryotic promoters.
title_sort nucleotide patterns aiding in prediction of eukaryotic promoters
url http://europepmc.org/articles/PMC5687710?pdf=render
work_keys_str_mv AT martintriska nucleotidepatternsaidinginpredictionofeukaryoticpromoters
AT victorsolovyev nucleotidepatternsaidinginpredictionofeukaryoticpromoters
AT anchabaranova nucleotidepatternsaidinginpredictionofeukaryoticpromoters
AT alexanderkel nucleotidepatternsaidinginpredictionofeukaryoticpromoters
AT tatianavtatarinova nucleotidepatternsaidinginpredictionofeukaryoticpromoters