RNA-Seq improves annotation of protein-coding genes in the cucumber genome

<p>Abstract</p> <p>Background</p> <p>As more and more genomes are sequenced, genome annotation becomes increasingly important in bridging the gap between sequence and biology. Gene prediction, which is at the center of genome annotation, usually integrates various resou...

Full description

Bibliographic Details
Main Authors: Fei Zhangjun, Yan Pengcheng, Huang Sanwen, Zhang Zhonghua, Li Zhen, Lin Kui
Format: Article
Language:English
Published: BMC 2011-11-01
Series:BMC Genomics
Online Access:http://www.biomedcentral.com/1471-2164/12/540
_version_ 1818583754098081792
author Fei Zhangjun
Yan Pengcheng
Huang Sanwen
Zhang Zhonghua
Li Zhen
Lin Kui
author_facet Fei Zhangjun
Yan Pengcheng
Huang Sanwen
Zhang Zhonghua
Li Zhen
Lin Kui
author_sort Fei Zhangjun
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>As more and more genomes are sequenced, genome annotation becomes increasingly important in bridging the gap between sequence and biology. Gene prediction, which is at the center of genome annotation, usually integrates various resources to compute consensus gene structures. However, many newly sequenced genomes have limited resources for gene predictions. In an effort to create high-quality gene models of the cucumber genome (<it>Cucumis sativus </it>var. <it>sativus</it>), based on the EVidenceModeler gene prediction pipeline, we incorporated the massively parallel complementary DNA sequencing (RNA-Seq) reads of 10 cucumber tissues into EVidenceModeler. We applied the new pipeline to the reassembled cucumber genome and included a comparison between our predicted protein-coding gene sets and a published set.</p> <p>Results</p> <p>The reassembled cucumber genome, annotated with RNA-Seq reads from 10 tissues, has 23, 248 identified protein-coding genes. Compared with the published prediction in 2009, approximately 8, 700 genes reveal structural modifications and 5, 285 genes only appear in the reassembled cucumber genome. All the related results, including genome sequence and annotations, are available at <url>http://cmb.bnu.edu.cn/Cucumis_sativus_v20/</url>.</p> <p>Conclusions</p> <p>We conclude that RNA-Seq greatly improves the accuracy of prediction of protein-coding genes in the reassembled cucumber genome. The comparison between the two gene sets also suggests that it is feasible to use RNA-Seq reads to annotate newly sequenced or less-studied genomes.</p>
first_indexed 2024-12-16T08:10:18Z
format Article
id doaj.art-be7edefc6b364e97927efd423a32efee
institution Directory Open Access Journal
issn 1471-2164
language English
last_indexed 2024-12-16T08:10:18Z
publishDate 2011-11-01
publisher BMC
record_format Article
series BMC Genomics
spelling doaj.art-be7edefc6b364e97927efd423a32efee2022-12-21T22:38:22ZengBMCBMC Genomics1471-21642011-11-0112154010.1186/1471-2164-12-540RNA-Seq improves annotation of protein-coding genes in the cucumber genomeFei ZhangjunYan PengchengHuang SanwenZhang ZhonghuaLi ZhenLin Kui<p>Abstract</p> <p>Background</p> <p>As more and more genomes are sequenced, genome annotation becomes increasingly important in bridging the gap between sequence and biology. Gene prediction, which is at the center of genome annotation, usually integrates various resources to compute consensus gene structures. However, many newly sequenced genomes have limited resources for gene predictions. In an effort to create high-quality gene models of the cucumber genome (<it>Cucumis sativus </it>var. <it>sativus</it>), based on the EVidenceModeler gene prediction pipeline, we incorporated the massively parallel complementary DNA sequencing (RNA-Seq) reads of 10 cucumber tissues into EVidenceModeler. We applied the new pipeline to the reassembled cucumber genome and included a comparison between our predicted protein-coding gene sets and a published set.</p> <p>Results</p> <p>The reassembled cucumber genome, annotated with RNA-Seq reads from 10 tissues, has 23, 248 identified protein-coding genes. Compared with the published prediction in 2009, approximately 8, 700 genes reveal structural modifications and 5, 285 genes only appear in the reassembled cucumber genome. All the related results, including genome sequence and annotations, are available at <url>http://cmb.bnu.edu.cn/Cucumis_sativus_v20/</url>.</p> <p>Conclusions</p> <p>We conclude that RNA-Seq greatly improves the accuracy of prediction of protein-coding genes in the reassembled cucumber genome. The comparison between the two gene sets also suggests that it is feasible to use RNA-Seq reads to annotate newly sequenced or less-studied genomes.</p>http://www.biomedcentral.com/1471-2164/12/540
spellingShingle Fei Zhangjun
Yan Pengcheng
Huang Sanwen
Zhang Zhonghua
Li Zhen
Lin Kui
RNA-Seq improves annotation of protein-coding genes in the cucumber genome
BMC Genomics
title RNA-Seq improves annotation of protein-coding genes in the cucumber genome
title_full RNA-Seq improves annotation of protein-coding genes in the cucumber genome
title_fullStr RNA-Seq improves annotation of protein-coding genes in the cucumber genome
title_full_unstemmed RNA-Seq improves annotation of protein-coding genes in the cucumber genome
title_short RNA-Seq improves annotation of protein-coding genes in the cucumber genome
title_sort rna seq improves annotation of protein coding genes in the cucumber genome
url http://www.biomedcentral.com/1471-2164/12/540
work_keys_str_mv AT feizhangjun rnaseqimprovesannotationofproteincodinggenesinthecucumbergenome
AT yanpengcheng rnaseqimprovesannotationofproteincodinggenesinthecucumbergenome
AT huangsanwen rnaseqimprovesannotationofproteincodinggenesinthecucumbergenome
AT zhangzhonghua rnaseqimprovesannotationofproteincodinggenesinthecucumbergenome
AT lizhen rnaseqimprovesannotationofproteincodinggenesinthecucumbergenome
AT linkui rnaseqimprovesannotationofproteincodinggenesinthecucumbergenome