Consideration of non-canonical splice sites improves gene prediction on the Arabidopsis thaliana Niederzenz-1 genome sequence
Abstract Objective The Arabidopsis thaliana Niederzenz-1 genome sequence was recently published with an ab initio gene prediction. In depth analysis of the predicted gene set revealed some errors involving genes with non-canonical splice sites in their introns. Since non-canonical splice sites are d...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2017-12-01
|
Series: | BMC Research Notes |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s13104-017-2985-y |
_version_ | 1818037254161956864 |
---|---|
author | Boas Pucker Daniela Holtgräwe Bernd Weisshaar |
author_facet | Boas Pucker Daniela Holtgräwe Bernd Weisshaar |
author_sort | Boas Pucker |
collection | DOAJ |
description | Abstract Objective The Arabidopsis thaliana Niederzenz-1 genome sequence was recently published with an ab initio gene prediction. In depth analysis of the predicted gene set revealed some errors involving genes with non-canonical splice sites in their introns. Since non-canonical splice sites are difficult to predict ab initio, we checked for options to improve the annotation by transferring annotation information from the recently released Columbia-0 reference genome sequence annotation Araport11. Results Incorporation of hints generated from Araport11 enabled the precise prediction of non-canonical splice sites. Manual inspection of RNA-Seq read mapping and RT-PCR were applied to validate the structural annotations of non-canonical splice sites. Predictions of untranslated regions were also updated by harnessing the potential of Araport11’s information, which was generated by using high coverage RNA-Seq data. The improved gene set of the Nd-1 genome assembly (GeneSet_Nd-1_v1.1) was evaluated via comparison to the initial gene prediction (GeneSet_Nd-1_v1.0) as well as against Araport11 for the Col-0 reference genome sequence. GeneSet_Nd-1_v1.1 contains previously missed non-canonical splice sites in 1256 genes. Reciprocal best hits for 24,527 (89.4%) of all nuclear Col-0 genes against the GeneSet_Nd-1_v1.1 indicate a high gene prediction quality. |
first_indexed | 2024-12-10T07:23:55Z |
format | Article |
id | doaj.art-7df92d38956d4ab2b5ee6015880bab9a |
institution | Directory Open Access Journal |
issn | 1756-0500 |
language | English |
last_indexed | 2024-12-10T07:23:55Z |
publishDate | 2017-12-01 |
publisher | BMC |
record_format | Article |
series | BMC Research Notes |
spelling | doaj.art-7df92d38956d4ab2b5ee6015880bab9a2022-12-22T01:57:45ZengBMCBMC Research Notes1756-05002017-12-011011610.1186/s13104-017-2985-yConsideration of non-canonical splice sites improves gene prediction on the Arabidopsis thaliana Niederzenz-1 genome sequenceBoas Pucker0Daniela Holtgräwe1Bernd Weisshaar2Faculty of Biology & Center for Biotechnology, Bielefeld UniversityFaculty of Biology & Center for Biotechnology, Bielefeld UniversityFaculty of Biology & Center for Biotechnology, Bielefeld UniversityAbstract Objective The Arabidopsis thaliana Niederzenz-1 genome sequence was recently published with an ab initio gene prediction. In depth analysis of the predicted gene set revealed some errors involving genes with non-canonical splice sites in their introns. Since non-canonical splice sites are difficult to predict ab initio, we checked for options to improve the annotation by transferring annotation information from the recently released Columbia-0 reference genome sequence annotation Araport11. Results Incorporation of hints generated from Araport11 enabled the precise prediction of non-canonical splice sites. Manual inspection of RNA-Seq read mapping and RT-PCR were applied to validate the structural annotations of non-canonical splice sites. Predictions of untranslated regions were also updated by harnessing the potential of Araport11’s information, which was generated by using high coverage RNA-Seq data. The improved gene set of the Nd-1 genome assembly (GeneSet_Nd-1_v1.1) was evaluated via comparison to the initial gene prediction (GeneSet_Nd-1_v1.0) as well as against Araport11 for the Col-0 reference genome sequence. GeneSet_Nd-1_v1.1 contains previously missed non-canonical splice sites in 1256 genes. Reciprocal best hits for 24,527 (89.4%) of all nuclear Col-0 genes against the GeneSet_Nd-1_v1.1 indicate a high gene prediction quality.http://link.springer.com/article/10.1186/s13104-017-2985-yGenome annotationSplicingAraport11Gene prediction hintsReciprocal best hit |
spellingShingle | Boas Pucker Daniela Holtgräwe Bernd Weisshaar Consideration of non-canonical splice sites improves gene prediction on the Arabidopsis thaliana Niederzenz-1 genome sequence BMC Research Notes Genome annotation Splicing Araport11 Gene prediction hints Reciprocal best hit |
title | Consideration of non-canonical splice sites improves gene prediction on the Arabidopsis thaliana Niederzenz-1 genome sequence |
title_full | Consideration of non-canonical splice sites improves gene prediction on the Arabidopsis thaliana Niederzenz-1 genome sequence |
title_fullStr | Consideration of non-canonical splice sites improves gene prediction on the Arabidopsis thaliana Niederzenz-1 genome sequence |
title_full_unstemmed | Consideration of non-canonical splice sites improves gene prediction on the Arabidopsis thaliana Niederzenz-1 genome sequence |
title_short | Consideration of non-canonical splice sites improves gene prediction on the Arabidopsis thaliana Niederzenz-1 genome sequence |
title_sort | consideration of non canonical splice sites improves gene prediction on the arabidopsis thaliana niederzenz 1 genome sequence |
topic | Genome annotation Splicing Araport11 Gene prediction hints Reciprocal best hit |
url | http://link.springer.com/article/10.1186/s13104-017-2985-y |
work_keys_str_mv | AT boaspucker considerationofnoncanonicalsplicesitesimprovesgenepredictiononthearabidopsisthaliananiederzenz1genomesequence AT danielaholtgrawe considerationofnoncanonicalsplicesitesimprovesgenepredictiononthearabidopsisthaliananiederzenz1genomesequence AT berndweisshaar considerationofnoncanonicalsplicesitesimprovesgenepredictiononthearabidopsisthaliananiederzenz1genomesequence |