Genome-Wide Prediction of Transcription Start Sites in Conifers
The identification of promoters is an essential step in the genome annotation process, providing a framework for gene regulatory networks and their role in transcription regulation. Despite considerable advances in the high-throughput determination of transcription start sites (TSSs) and transcripti...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-02-01
|
Series: | International Journal of Molecular Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/1422-0067/23/3/1735 |
_version_ | 1797487165414834176 |
---|---|
author | Eugeniya I. Bondar Maxim E. Troukhan Konstantin V. Krutovsky Tatiana V. Tatarinova |
author_facet | Eugeniya I. Bondar Maxim E. Troukhan Konstantin V. Krutovsky Tatiana V. Tatarinova |
author_sort | Eugeniya I. Bondar |
collection | DOAJ |
description | The identification of promoters is an essential step in the genome annotation process, providing a framework for gene regulatory networks and their role in transcription regulation. Despite considerable advances in the high-throughput determination of transcription start sites (TSSs) and transcription factor binding sites (TFBSs), experimental methods are still time-consuming and expensive. Instead, several computational approaches have been developed to provide fast and reliable means for predicting the location of TSSs and regulatory motifs on a genome-wide scale. Numerous studies have been carried out on the regulatory elements of mammalian genomes, but plant promoters, especially in gymnosperms, have been left out of the limelight and, therefore, have been poorly investigated. The aim of this study was to enhance and expand the existing genome annotations using computational approaches for genome-wide prediction of TSSs in the four conifer species: loblolly pine, white spruce, Norway spruce, and Siberian larch. Our pipeline will be useful for TSS predictions in other genomes, especially for draft assemblies, where reliable TSS predictions are not usually available. We also explored some of the features of the nucleotide composition of the predicted promoters and compared the GC properties of conifer genes with model monocot and dicot plants. Here, we demonstrate that even incomplete genome assemblies and partial annotations can be a reliable starting point for TSS annotation. The results of the TSS prediction in four conifer species have been deposited in the Persephone genome browser, which allows smooth visualization and is optimized for large data sets. This work provides the initial basis for future experimental validation and the study of the regulatory regions to understand gene regulation in gymnosperms. |
first_indexed | 2024-03-09T23:44:40Z |
format | Article |
id | doaj.art-4a11ac076c2e48e08a5db65b9d452271 |
institution | Directory Open Access Journal |
issn | 1661-6596 1422-0067 |
language | English |
last_indexed | 2024-03-09T23:44:40Z |
publishDate | 2022-02-01 |
publisher | MDPI AG |
record_format | Article |
series | International Journal of Molecular Sciences |
spelling | doaj.art-4a11ac076c2e48e08a5db65b9d4522712023-11-23T16:45:49ZengMDPI AGInternational Journal of Molecular Sciences1661-65961422-00672022-02-01233173510.3390/ijms23031735Genome-Wide Prediction of Transcription Start Sites in ConifersEugeniya I. Bondar0Maxim E. Troukhan1Konstantin V. Krutovsky2Tatiana V. Tatarinova3Laboratory of Forest Genomics, Institute of Fundamental Biology and Biotechnology, Siberian Federal University, 660036 Krasnoyarsk, RussiaPersephone Software LLC, Agoura Hills, CA 91301, USALaboratory of Forest Genomics, Institute of Fundamental Biology and Biotechnology, Siberian Federal University, 660036 Krasnoyarsk, RussiaDepartment of Genomics and Bioinformatics, Institute of Fundamental Biology and Biotechnology, Siberian Federal University, 660074 Krasnoyarsk, RussiaThe identification of promoters is an essential step in the genome annotation process, providing a framework for gene regulatory networks and their role in transcription regulation. Despite considerable advances in the high-throughput determination of transcription start sites (TSSs) and transcription factor binding sites (TFBSs), experimental methods are still time-consuming and expensive. Instead, several computational approaches have been developed to provide fast and reliable means for predicting the location of TSSs and regulatory motifs on a genome-wide scale. Numerous studies have been carried out on the regulatory elements of mammalian genomes, but plant promoters, especially in gymnosperms, have been left out of the limelight and, therefore, have been poorly investigated. The aim of this study was to enhance and expand the existing genome annotations using computational approaches for genome-wide prediction of TSSs in the four conifer species: loblolly pine, white spruce, Norway spruce, and Siberian larch. Our pipeline will be useful for TSS predictions in other genomes, especially for draft assemblies, where reliable TSS predictions are not usually available. We also explored some of the features of the nucleotide composition of the predicted promoters and compared the GC properties of conifer genes with model monocot and dicot plants. Here, we demonstrate that even incomplete genome assemblies and partial annotations can be a reliable starting point for TSS annotation. The results of the TSS prediction in four conifer species have been deposited in the Persephone genome browser, which allows smooth visualization and is optimized for large data sets. This work provides the initial basis for future experimental validation and the study of the regulatory regions to understand gene regulation in gymnosperms.https://www.mdpi.com/1422-0067/23/3/1735transcription start sitetranscription factor binding siteTATA-boxconifergymnospermspromoter prediction |
spellingShingle | Eugeniya I. Bondar Maxim E. Troukhan Konstantin V. Krutovsky Tatiana V. Tatarinova Genome-Wide Prediction of Transcription Start Sites in Conifers International Journal of Molecular Sciences transcription start site transcription factor binding site TATA-box conifer gymnosperms promoter prediction |
title | Genome-Wide Prediction of Transcription Start Sites in Conifers |
title_full | Genome-Wide Prediction of Transcription Start Sites in Conifers |
title_fullStr | Genome-Wide Prediction of Transcription Start Sites in Conifers |
title_full_unstemmed | Genome-Wide Prediction of Transcription Start Sites in Conifers |
title_short | Genome-Wide Prediction of Transcription Start Sites in Conifers |
title_sort | genome wide prediction of transcription start sites in conifers |
topic | transcription start site transcription factor binding site TATA-box conifer gymnosperms promoter prediction |
url | https://www.mdpi.com/1422-0067/23/3/1735 |
work_keys_str_mv | AT eugeniyaibondar genomewidepredictionoftranscriptionstartsitesinconifers AT maximetroukhan genomewidepredictionoftranscriptionstartsitesinconifers AT konstantinvkrutovsky genomewidepredictionoftranscriptionstartsitesinconifers AT tatianavtatarinova genomewidepredictionoftranscriptionstartsitesinconifers |