Evaluation of genomic island predictors using a comparative genomics approach

<p>Abstract</p> <p>Background</p> <p>Genomic islands (GIs) are clusters of genes in prokaryotic genomes of probable horizontal origin. GIs are disproportionately associated with microbial adaptations of medical or environmental interest. Recently, multiple programs for...

Full description

Bibliographic Details
Main Authors: Brinkman Fiona SL, Hsiao William WL, Langille Morgan GI
Format: Article
Language:English
Published: BMC 2008-08-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/9/329
_version_ 1818120400573300736
author Brinkman Fiona SL
Hsiao William WL
Langille Morgan GI
author_facet Brinkman Fiona SL
Hsiao William WL
Langille Morgan GI
author_sort Brinkman Fiona SL
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>Genomic islands (GIs) are clusters of genes in prokaryotic genomes of probable horizontal origin. GIs are disproportionately associated with microbial adaptations of medical or environmental interest. Recently, multiple programs for automated detection of GIs have been developed that utilize sequence composition characteristics, such as G+C ratio and dinucleotide bias. To robustly evaluate the accuracy of such methods, we propose that a dataset of GIs be constructed using criteria that are independent of sequence composition-based analysis approaches.</p> <p>Results</p> <p>We developed a comparative genomics approach (IslandPick) that identifies both very probable islands and non-island regions. The approach involves 1) flexible, automated selection of comparative genomes for each query genome, using a distance function that picks appropriate genomes for identification of GIs, 2) identification of regions unique to the query genome, compared with the chosen genomes (positive dataset) and 3) identification of regions conserved across all genomes (negative dataset). Using our constructed datasets, we investigated the accuracy of several sequence composition-based GI prediction tools.</p> <p>Conclusion</p> <p>Our results indicate that AlienHunter has the highest recall, but the lowest measured precision, while SIGI-HMM is the most precise method. SIGI-HMM and IslandPath/DIMOB have comparable overall highest accuracy. Our comparative genomics approach, IslandPick, was the most accurate, compared with a curated list of GIs, indicating that we have constructed suitable datasets. This represents the first evaluation, using diverse and, independent datasets that were not artificially constructed, of the accuracy of several sequence composition-based GI predictors. The caveats associated with this analysis and proposals for optimal island prediction are discussed.</p>
first_indexed 2024-12-11T05:25:30Z
format Article
id doaj.art-6f3f1619935144b09a37c0c7422410c7
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-11T05:25:30Z
publishDate 2008-08-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-6f3f1619935144b09a37c0c7422410c72022-12-22T01:19:35ZengBMCBMC Bioinformatics1471-21052008-08-019132910.1186/1471-2105-9-329Evaluation of genomic island predictors using a comparative genomics approachBrinkman Fiona SLHsiao William WLLangille Morgan GI<p>Abstract</p> <p>Background</p> <p>Genomic islands (GIs) are clusters of genes in prokaryotic genomes of probable horizontal origin. GIs are disproportionately associated with microbial adaptations of medical or environmental interest. Recently, multiple programs for automated detection of GIs have been developed that utilize sequence composition characteristics, such as G+C ratio and dinucleotide bias. To robustly evaluate the accuracy of such methods, we propose that a dataset of GIs be constructed using criteria that are independent of sequence composition-based analysis approaches.</p> <p>Results</p> <p>We developed a comparative genomics approach (IslandPick) that identifies both very probable islands and non-island regions. The approach involves 1) flexible, automated selection of comparative genomes for each query genome, using a distance function that picks appropriate genomes for identification of GIs, 2) identification of regions unique to the query genome, compared with the chosen genomes (positive dataset) and 3) identification of regions conserved across all genomes (negative dataset). Using our constructed datasets, we investigated the accuracy of several sequence composition-based GI prediction tools.</p> <p>Conclusion</p> <p>Our results indicate that AlienHunter has the highest recall, but the lowest measured precision, while SIGI-HMM is the most precise method. SIGI-HMM and IslandPath/DIMOB have comparable overall highest accuracy. Our comparative genomics approach, IslandPick, was the most accurate, compared with a curated list of GIs, indicating that we have constructed suitable datasets. This represents the first evaluation, using diverse and, independent datasets that were not artificially constructed, of the accuracy of several sequence composition-based GI predictors. The caveats associated with this analysis and proposals for optimal island prediction are discussed.</p>http://www.biomedcentral.com/1471-2105/9/329
spellingShingle Brinkman Fiona SL
Hsiao William WL
Langille Morgan GI
Evaluation of genomic island predictors using a comparative genomics approach
BMC Bioinformatics
title Evaluation of genomic island predictors using a comparative genomics approach
title_full Evaluation of genomic island predictors using a comparative genomics approach
title_fullStr Evaluation of genomic island predictors using a comparative genomics approach
title_full_unstemmed Evaluation of genomic island predictors using a comparative genomics approach
title_short Evaluation of genomic island predictors using a comparative genomics approach
title_sort evaluation of genomic island predictors using a comparative genomics approach
url http://www.biomedcentral.com/1471-2105/9/329
work_keys_str_mv AT brinkmanfionasl evaluationofgenomicislandpredictorsusingacomparativegenomicsapproach
AT hsiaowilliamwl evaluationofgenomicislandpredictorsusingacomparativegenomicsapproach
AT langillemorgangi evaluationofgenomicislandpredictorsusingacomparativegenomicsapproach