Context-driven discovery of gene cassettes in mobile integrons using a computational grammar

<p>Abstract</p> <p>Background</p> <p>Gene discovery algorithms typically examine sequence data for low level patterns. A novel method to computationally discover higher order DNA structures is presented, using a context sensitive grammar. The algorithm was applied to th...

Full description

Bibliographic Details
Main Authors: Schaeffer Jaron, Partridge Sally R, Coiera Enrico, Tsafnat Guy, Iredell Jon R
Format: Article
Language:English
Published: BMC 2009-09-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/10/281
_version_ 1811278620332654592
author Schaeffer Jaron
Partridge Sally R
Coiera Enrico
Tsafnat Guy
Iredell Jon R
author_facet Schaeffer Jaron
Partridge Sally R
Coiera Enrico
Tsafnat Guy
Iredell Jon R
author_sort Schaeffer Jaron
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>Gene discovery algorithms typically examine sequence data for low level patterns. A novel method to computationally discover higher order DNA structures is presented, using a context sensitive grammar. The algorithm was applied to the discovery of gene cassettes associated with integrons. The discovery and annotation of antibiotic resistance genes in such cassettes is essential for effective monitoring of antibiotic resistance patterns and formulation of public health antibiotic prescription policies.</p> <p>Results</p> <p>We discovered two new putative gene cassettes using the method, from 276 integron features and 978 GenBank sequences. The system achieved <it>κ </it>= 0.972 annotation agreement with an expert gold standard of 300 sequences. In rediscovery experiments, we deleted 789,196 cassette instances over 2030 experiments and correctly relabelled 85.6% (<it>α </it>≥ 95%, <it>E </it>≤ 1%, mean sensitivity = 0.86, specificity = 1, F-score = 0.93), with no false positives.</p> <p>Error analysis demonstrated that for 72,338 missed deletions, two adjacent deleted cassettes were labeled as a single cassette, increasing performance to 94.8% (mean sensitivity = 0.92, specificity = 1, F-score = 0.96).</p> <p>Conclusion</p> <p>Using grammars we were able to represent heuristic background knowledge about large and complex structures in DNA. Importantly, we were also able to use the context embedded in the model to discover new putative antibiotic resistance gene cassettes. The method is complementary to existing automatic annotation systems which operate at the sequence level.</p>
first_indexed 2024-04-13T00:39:06Z
format Article
id doaj.art-96d4d8b15ab945fc84b7d30ffe7e66d4
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-04-13T00:39:06Z
publishDate 2009-09-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-96d4d8b15ab945fc84b7d30ffe7e66d42022-12-22T03:10:14ZengBMCBMC Bioinformatics1471-21052009-09-0110128110.1186/1471-2105-10-281Context-driven discovery of gene cassettes in mobile integrons using a computational grammarSchaeffer JaronPartridge Sally RCoiera EnricoTsafnat GuyIredell Jon R<p>Abstract</p> <p>Background</p> <p>Gene discovery algorithms typically examine sequence data for low level patterns. A novel method to computationally discover higher order DNA structures is presented, using a context sensitive grammar. The algorithm was applied to the discovery of gene cassettes associated with integrons. The discovery and annotation of antibiotic resistance genes in such cassettes is essential for effective monitoring of antibiotic resistance patterns and formulation of public health antibiotic prescription policies.</p> <p>Results</p> <p>We discovered two new putative gene cassettes using the method, from 276 integron features and 978 GenBank sequences. The system achieved <it>κ </it>= 0.972 annotation agreement with an expert gold standard of 300 sequences. In rediscovery experiments, we deleted 789,196 cassette instances over 2030 experiments and correctly relabelled 85.6% (<it>α </it>≥ 95%, <it>E </it>≤ 1%, mean sensitivity = 0.86, specificity = 1, F-score = 0.93), with no false positives.</p> <p>Error analysis demonstrated that for 72,338 missed deletions, two adjacent deleted cassettes were labeled as a single cassette, increasing performance to 94.8% (mean sensitivity = 0.92, specificity = 1, F-score = 0.96).</p> <p>Conclusion</p> <p>Using grammars we were able to represent heuristic background knowledge about large and complex structures in DNA. Importantly, we were also able to use the context embedded in the model to discover new putative antibiotic resistance gene cassettes. The method is complementary to existing automatic annotation systems which operate at the sequence level.</p>http://www.biomedcentral.com/1471-2105/10/281
spellingShingle Schaeffer Jaron
Partridge Sally R
Coiera Enrico
Tsafnat Guy
Iredell Jon R
Context-driven discovery of gene cassettes in mobile integrons using a computational grammar
BMC Bioinformatics
title Context-driven discovery of gene cassettes in mobile integrons using a computational grammar
title_full Context-driven discovery of gene cassettes in mobile integrons using a computational grammar
title_fullStr Context-driven discovery of gene cassettes in mobile integrons using a computational grammar
title_full_unstemmed Context-driven discovery of gene cassettes in mobile integrons using a computational grammar
title_short Context-driven discovery of gene cassettes in mobile integrons using a computational grammar
title_sort context driven discovery of gene cassettes in mobile integrons using a computational grammar
url http://www.biomedcentral.com/1471-2105/10/281
work_keys_str_mv AT schaefferjaron contextdrivendiscoveryofgenecassettesinmobileintegronsusingacomputationalgrammar
AT partridgesallyr contextdrivendiscoveryofgenecassettesinmobileintegronsusingacomputationalgrammar
AT coieraenrico contextdrivendiscoveryofgenecassettesinmobileintegronsusingacomputationalgrammar
AT tsafnatguy contextdrivendiscoveryofgenecassettesinmobileintegronsusingacomputationalgrammar
AT iredelljonr contextdrivendiscoveryofgenecassettesinmobileintegronsusingacomputationalgrammar