A Novel Quality Measure and Correction Procedure for the Annotation of Microbial Translation Initiation Sites.

The identification of translation initiation sites (TISs) constitutes an important aspect of sequence-based genome analysis. An erroneous TIS annotation can impair the identification of regulatory elements and N-terminal signal peptides, and also may flaw the determination of descent, for any partic...

Full description

Bibliographic Details
Main Authors: Lex Overmars, Roland J Siezen, Christof Francke
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2015-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0133691
_version_ 1818407529491726336
author Lex Overmars
Roland J Siezen
Christof Francke
author_facet Lex Overmars
Roland J Siezen
Christof Francke
author_sort Lex Overmars
collection DOAJ
description The identification of translation initiation sites (TISs) constitutes an important aspect of sequence-based genome analysis. An erroneous TIS annotation can impair the identification of regulatory elements and N-terminal signal peptides, and also may flaw the determination of descent, for any particular gene. We have formulated a reference-free method to score the TIS annotation quality. The method is based on a comparison of the observed and expected distribution of all TISs in a particular genome given prior gene-calling. We have assessed the TIS annotations for all available NCBI RefSeq microbial genomes and found that approximately 87% is of appropriate quality, whereas 13% needs substantial improvement. We have analyzed a number of factors that could affect TIS annotation quality such as GC-content, taxonomy, the fraction of genes with a Shine-Dalgarno sequence and the year of publication. The analysis showed that only the first factor has a clear effect. We have then formulated a straightforward Principle Component Analysis-based TIS identification strategy to self-organize and score potential TISs. The strategy is independent of reference data and a priori calculations. A representative set of 277 genomes was subjected to the analysis and we found a clear increase in TIS annotation quality for the genomes with a low quality score. The PCA-based annotation was also compared with annotation with the current tool of reference, Prodigal. The comparison for the model genome of Escherichia coli K12 showed that both methods supplement each other and that prediction agreement can be used as an indicator of a correct TIS annotation. Importantly, the data suggest that the addition of a PCA-based strategy to a Prodigal prediction can be used to 'flag' TIS annotations for re-evaluation and in addition can be used to evaluate a given annotation in case a Prodigal annotation is lacking.
first_indexed 2024-12-14T09:29:17Z
format Article
id doaj.art-e081371bcd0e4743aee3e764ed86d069
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-12-14T09:29:17Z
publishDate 2015-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-e081371bcd0e4743aee3e764ed86d0692022-12-21T23:08:08ZengPublic Library of Science (PLoS)PLoS ONE1932-62032015-01-01107e013369110.1371/journal.pone.0133691A Novel Quality Measure and Correction Procedure for the Annotation of Microbial Translation Initiation Sites.Lex OvermarsRoland J SiezenChristof FranckeThe identification of translation initiation sites (TISs) constitutes an important aspect of sequence-based genome analysis. An erroneous TIS annotation can impair the identification of regulatory elements and N-terminal signal peptides, and also may flaw the determination of descent, for any particular gene. We have formulated a reference-free method to score the TIS annotation quality. The method is based on a comparison of the observed and expected distribution of all TISs in a particular genome given prior gene-calling. We have assessed the TIS annotations for all available NCBI RefSeq microbial genomes and found that approximately 87% is of appropriate quality, whereas 13% needs substantial improvement. We have analyzed a number of factors that could affect TIS annotation quality such as GC-content, taxonomy, the fraction of genes with a Shine-Dalgarno sequence and the year of publication. The analysis showed that only the first factor has a clear effect. We have then formulated a straightforward Principle Component Analysis-based TIS identification strategy to self-organize and score potential TISs. The strategy is independent of reference data and a priori calculations. A representative set of 277 genomes was subjected to the analysis and we found a clear increase in TIS annotation quality for the genomes with a low quality score. The PCA-based annotation was also compared with annotation with the current tool of reference, Prodigal. The comparison for the model genome of Escherichia coli K12 showed that both methods supplement each other and that prediction agreement can be used as an indicator of a correct TIS annotation. Importantly, the data suggest that the addition of a PCA-based strategy to a Prodigal prediction can be used to 'flag' TIS annotations for re-evaluation and in addition can be used to evaluate a given annotation in case a Prodigal annotation is lacking.https://doi.org/10.1371/journal.pone.0133691
spellingShingle Lex Overmars
Roland J Siezen
Christof Francke
A Novel Quality Measure and Correction Procedure for the Annotation of Microbial Translation Initiation Sites.
PLoS ONE
title A Novel Quality Measure and Correction Procedure for the Annotation of Microbial Translation Initiation Sites.
title_full A Novel Quality Measure and Correction Procedure for the Annotation of Microbial Translation Initiation Sites.
title_fullStr A Novel Quality Measure and Correction Procedure for the Annotation of Microbial Translation Initiation Sites.
title_full_unstemmed A Novel Quality Measure and Correction Procedure for the Annotation of Microbial Translation Initiation Sites.
title_short A Novel Quality Measure and Correction Procedure for the Annotation of Microbial Translation Initiation Sites.
title_sort novel quality measure and correction procedure for the annotation of microbial translation initiation sites
url https://doi.org/10.1371/journal.pone.0133691
work_keys_str_mv AT lexovermars anovelqualitymeasureandcorrectionprocedurefortheannotationofmicrobialtranslationinitiationsites
AT rolandjsiezen anovelqualitymeasureandcorrectionprocedurefortheannotationofmicrobialtranslationinitiationsites
AT christoffrancke anovelqualitymeasureandcorrectionprocedurefortheannotationofmicrobialtranslationinitiationsites
AT lexovermars novelqualitymeasureandcorrectionprocedurefortheannotationofmicrobialtranslationinitiationsites
AT rolandjsiezen novelqualitymeasureandcorrectionprocedurefortheannotationofmicrobialtranslationinitiationsites
AT christoffrancke novelqualitymeasureandcorrectionprocedurefortheannotationofmicrobialtranslationinitiationsites