Genome-Wide Prediction of Transcription Start Sites in Conifers

The identification of promoters is an essential step in the genome annotation process, providing a framework for gene regulatory networks and their role in transcription regulation. Despite considerable advances in the high-throughput determination of transcription start sites (TSSs) and transcripti...

Full description

Bibliographic Details
Main Authors: Eugeniya I. Bondar, Maxim E. Troukhan, Konstantin V. Krutovsky, Tatiana V. Tatarinova
Format: Article
Language:English
Published: MDPI AG 2022-02-01
Series:International Journal of Molecular Sciences
Subjects:
Online Access:https://www.mdpi.com/1422-0067/23/3/1735
_version_ 1797487165414834176
author Eugeniya I. Bondar
Maxim E. Troukhan
Konstantin V. Krutovsky
Tatiana V. Tatarinova
author_facet Eugeniya I. Bondar
Maxim E. Troukhan
Konstantin V. Krutovsky
Tatiana V. Tatarinova
author_sort Eugeniya I. Bondar
collection DOAJ
description The identification of promoters is an essential step in the genome annotation process, providing a framework for gene regulatory networks and their role in transcription regulation. Despite considerable advances in the high-throughput determination of transcription start sites (TSSs) and transcription factor binding sites (TFBSs), experimental methods are still time-consuming and expensive. Instead, several computational approaches have been developed to provide fast and reliable means for predicting the location of TSSs and regulatory motifs on a genome-wide scale. Numerous studies have been carried out on the regulatory elements of mammalian genomes, but plant promoters, especially in gymnosperms, have been left out of the limelight and, therefore, have been poorly investigated. The aim of this study was to enhance and expand the existing genome annotations using computational approaches for genome-wide prediction of TSSs in the four conifer species: loblolly pine, white spruce, Norway spruce, and Siberian larch. Our pipeline will be useful for TSS predictions in other genomes, especially for draft assemblies, where reliable TSS predictions are not usually available. We also explored some of the features of the nucleotide composition of the predicted promoters and compared the GC properties of conifer genes with model monocot and dicot plants. Here, we demonstrate that even incomplete genome assemblies and partial annotations can be a reliable starting point for TSS annotation. The results of the TSS prediction in four conifer species have been deposited in the Persephone genome browser, which allows smooth visualization and is optimized for large data sets. This work provides the initial basis for future experimental validation and the study of the regulatory regions to understand gene regulation in gymnosperms.
first_indexed 2024-03-09T23:44:40Z
format Article
id doaj.art-4a11ac076c2e48e08a5db65b9d452271
institution Directory Open Access Journal
issn 1661-6596
1422-0067
language English
last_indexed 2024-03-09T23:44:40Z
publishDate 2022-02-01
publisher MDPI AG
record_format Article
series International Journal of Molecular Sciences
spelling doaj.art-4a11ac076c2e48e08a5db65b9d4522712023-11-23T16:45:49ZengMDPI AGInternational Journal of Molecular Sciences1661-65961422-00672022-02-01233173510.3390/ijms23031735Genome-Wide Prediction of Transcription Start Sites in ConifersEugeniya I. Bondar0Maxim E. Troukhan1Konstantin V. Krutovsky2Tatiana V. Tatarinova3Laboratory of Forest Genomics, Institute of Fundamental Biology and Biotechnology, Siberian Federal University, 660036 Krasnoyarsk, RussiaPersephone Software LLC, Agoura Hills, CA 91301, USALaboratory of Forest Genomics, Institute of Fundamental Biology and Biotechnology, Siberian Federal University, 660036 Krasnoyarsk, RussiaDepartment of Genomics and Bioinformatics, Institute of Fundamental Biology and Biotechnology, Siberian Federal University, 660074 Krasnoyarsk, RussiaThe identification of promoters is an essential step in the genome annotation process, providing a framework for gene regulatory networks and their role in transcription regulation. Despite considerable advances in the high-throughput determination of transcription start sites (TSSs) and transcription factor binding sites (TFBSs), experimental methods are still time-consuming and expensive. Instead, several computational approaches have been developed to provide fast and reliable means for predicting the location of TSSs and regulatory motifs on a genome-wide scale. Numerous studies have been carried out on the regulatory elements of mammalian genomes, but plant promoters, especially in gymnosperms, have been left out of the limelight and, therefore, have been poorly investigated. The aim of this study was to enhance and expand the existing genome annotations using computational approaches for genome-wide prediction of TSSs in the four conifer species: loblolly pine, white spruce, Norway spruce, and Siberian larch. Our pipeline will be useful for TSS predictions in other genomes, especially for draft assemblies, where reliable TSS predictions are not usually available. We also explored some of the features of the nucleotide composition of the predicted promoters and compared the GC properties of conifer genes with model monocot and dicot plants. Here, we demonstrate that even incomplete genome assemblies and partial annotations can be a reliable starting point for TSS annotation. The results of the TSS prediction in four conifer species have been deposited in the Persephone genome browser, which allows smooth visualization and is optimized for large data sets. This work provides the initial basis for future experimental validation and the study of the regulatory regions to understand gene regulation in gymnosperms.https://www.mdpi.com/1422-0067/23/3/1735transcription start sitetranscription factor binding siteTATA-boxconifergymnospermspromoter prediction
spellingShingle Eugeniya I. Bondar
Maxim E. Troukhan
Konstantin V. Krutovsky
Tatiana V. Tatarinova
Genome-Wide Prediction of Transcription Start Sites in Conifers
International Journal of Molecular Sciences
transcription start site
transcription factor binding site
TATA-box
conifer
gymnosperms
promoter prediction
title Genome-Wide Prediction of Transcription Start Sites in Conifers
title_full Genome-Wide Prediction of Transcription Start Sites in Conifers
title_fullStr Genome-Wide Prediction of Transcription Start Sites in Conifers
title_full_unstemmed Genome-Wide Prediction of Transcription Start Sites in Conifers
title_short Genome-Wide Prediction of Transcription Start Sites in Conifers
title_sort genome wide prediction of transcription start sites in conifers
topic transcription start site
transcription factor binding site
TATA-box
conifer
gymnosperms
promoter prediction
url https://www.mdpi.com/1422-0067/23/3/1735
work_keys_str_mv AT eugeniyaibondar genomewidepredictionoftranscriptionstartsitesinconifers
AT maximetroukhan genomewidepredictionoftranscriptionstartsitesinconifers
AT konstantinvkrutovsky genomewidepredictionoftranscriptionstartsitesinconifers
AT tatianavtatarinova genomewidepredictionoftranscriptionstartsitesinconifers