Computational evaluation of TIS annotation for prokaryotic genomes

<p>Abstract</p> <p>Background</p> <p>Accurate annotation of translation initiation sites (TISs) is essential for understanding the translation initiation mechanism. However, the reliability of TIS annotation in widely used databases such as RefSeq is uncertain due to th...

Full description

Bibliographic Details
Main Authors: Zhu Huaiqiu, Ju Li-Ning, Zheng Xiaobin, Hu Gang-Qing, She Zhen-Su
Format: Article
Language:English
Published: BMC 2008-03-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/9/160
_version_ 1811314685845176320
author Zhu Huaiqiu
Ju Li-Ning
Zheng Xiaobin
Hu Gang-Qing
She Zhen-Su
author_facet Zhu Huaiqiu
Ju Li-Ning
Zheng Xiaobin
Hu Gang-Qing
She Zhen-Su
author_sort Zhu Huaiqiu
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>Accurate annotation of translation initiation sites (TISs) is essential for understanding the translation initiation mechanism. However, the reliability of TIS annotation in widely used databases such as RefSeq is uncertain due to the lack of experimental benchmarks.</p> <p>Results</p> <p>Based on a homogeneity assumption that gene translation-related signals are uniformly distributed across a genome, we have established a computational method for a large-scale quantitative assessment of the reliability of TIS annotations for any prokaryotic genome. The method consists of modeling a positional weight matrix (PWM) of aligned sequences around predicted TISs in terms of a linear combination of three elementary PWMs, one for true TIS and the two others for false TISs. The three elementary PWMs are obtained using a reference set with highly reliable TIS predictions. A generalized least square estimator determines the weighting of the true TIS in the observed PWM, from which the accuracy of the prediction is derived. The validity of the method and the extent of the limitation of the assumptions are explicitly addressed by testing on experimentally verified TISs with variable accuracy of the reference sets. The method is applied to estimate the accuracy of TIS annotations that are provided on public databases such as RefSeq and ProTISA and by programs such as EasyGene, GeneMarkS, Glimmer 3 and TiCo. It is shown that RefSeq's TIS prediction is significantly less accurate than two recent predictors, Tico and ProTISA. With convincing proofs, we show two general preferential biases in the RefSeq annotation, <it>i.e</it>. over-annotating the longest open reading frame (LORF) and under-annotating ATG start codon. Finally, we have established a new TIS database, SupTISA, based on the best prediction of all the predictors; SupTISA has achieved an average accuracy of 92% over all 532 complete genomes.</p> <p>Conclusion</p> <p>Large-scale computational evaluation of TIS annotation has been achieved. A new TIS database much better than RefSeq has been constructed, and it provides a valuable resource for further TIS studies.</p>
first_indexed 2024-04-13T11:15:51Z
format Article
id doaj.art-a5a40640e28c4c63ba2dae7b9f9ed945
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-04-13T11:15:51Z
publishDate 2008-03-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-a5a40640e28c4c63ba2dae7b9f9ed9452022-12-22T02:48:58ZengBMCBMC Bioinformatics1471-21052008-03-019116010.1186/1471-2105-9-160Computational evaluation of TIS annotation for prokaryotic genomesZhu HuaiqiuJu Li-NingZheng XiaobinHu Gang-QingShe Zhen-Su<p>Abstract</p> <p>Background</p> <p>Accurate annotation of translation initiation sites (TISs) is essential for understanding the translation initiation mechanism. However, the reliability of TIS annotation in widely used databases such as RefSeq is uncertain due to the lack of experimental benchmarks.</p> <p>Results</p> <p>Based on a homogeneity assumption that gene translation-related signals are uniformly distributed across a genome, we have established a computational method for a large-scale quantitative assessment of the reliability of TIS annotations for any prokaryotic genome. The method consists of modeling a positional weight matrix (PWM) of aligned sequences around predicted TISs in terms of a linear combination of three elementary PWMs, one for true TIS and the two others for false TISs. The three elementary PWMs are obtained using a reference set with highly reliable TIS predictions. A generalized least square estimator determines the weighting of the true TIS in the observed PWM, from which the accuracy of the prediction is derived. The validity of the method and the extent of the limitation of the assumptions are explicitly addressed by testing on experimentally verified TISs with variable accuracy of the reference sets. The method is applied to estimate the accuracy of TIS annotations that are provided on public databases such as RefSeq and ProTISA and by programs such as EasyGene, GeneMarkS, Glimmer 3 and TiCo. It is shown that RefSeq's TIS prediction is significantly less accurate than two recent predictors, Tico and ProTISA. With convincing proofs, we show two general preferential biases in the RefSeq annotation, <it>i.e</it>. over-annotating the longest open reading frame (LORF) and under-annotating ATG start codon. Finally, we have established a new TIS database, SupTISA, based on the best prediction of all the predictors; SupTISA has achieved an average accuracy of 92% over all 532 complete genomes.</p> <p>Conclusion</p> <p>Large-scale computational evaluation of TIS annotation has been achieved. A new TIS database much better than RefSeq has been constructed, and it provides a valuable resource for further TIS studies.</p>http://www.biomedcentral.com/1471-2105/9/160
spellingShingle Zhu Huaiqiu
Ju Li-Ning
Zheng Xiaobin
Hu Gang-Qing
She Zhen-Su
Computational evaluation of TIS annotation for prokaryotic genomes
BMC Bioinformatics
title Computational evaluation of TIS annotation for prokaryotic genomes
title_full Computational evaluation of TIS annotation for prokaryotic genomes
title_fullStr Computational evaluation of TIS annotation for prokaryotic genomes
title_full_unstemmed Computational evaluation of TIS annotation for prokaryotic genomes
title_short Computational evaluation of TIS annotation for prokaryotic genomes
title_sort computational evaluation of tis annotation for prokaryotic genomes
url http://www.biomedcentral.com/1471-2105/9/160
work_keys_str_mv AT zhuhuaiqiu computationalevaluationoftisannotationforprokaryoticgenomes
AT julining computationalevaluationoftisannotationforprokaryoticgenomes
AT zhengxiaobin computationalevaluationoftisannotationforprokaryoticgenomes
AT hugangqing computationalevaluationoftisannotationforprokaryoticgenomes
AT shezhensu computationalevaluationoftisannotationforprokaryoticgenomes