Indexing strategies for rapid searches of short words in genome sequences.

Searching for matches between large collections of short (14-30 nucleotides) words and sequence databases comprising full genomes or transcriptomes is a common task in biological sequence analysis. We investigated the performance of simple indexing strategies for handling such tasks and developed tw...

Full description

Bibliographic Details
Main Authors: Christian Iseli, Giovanna Ambrosini, Philipp Bucher, C Victor Jongeneel
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2007-06-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0000579
_version_ 1818588766145609728
author Christian Iseli
Giovanna Ambrosini
Philipp Bucher
C Victor Jongeneel
author_facet Christian Iseli
Giovanna Ambrosini
Philipp Bucher
C Victor Jongeneel
author_sort Christian Iseli
collection DOAJ
description Searching for matches between large collections of short (14-30 nucleotides) words and sequence databases comprising full genomes or transcriptomes is a common task in biological sequence analysis. We investigated the performance of simple indexing strategies for handling such tasks and developed two programs, fetchGWI and tagger, that index either the database or the query set. Either strategy outperforms megablast for searches with more than 10,000 probes. FetchGWI is shown to be a versatile tool for rapidly searching multiple genomes, whose performance is limited in most cases by the speed of access to the filesystem. We have made publicly available a Web interface for searching the human, mouse, and several other genomes and transcriptomes with oligonucleotide queries.
first_indexed 2024-12-16T09:29:58Z
format Article
id doaj.art-2c27c8ff963a4f1398a1929aab72594b
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-12-16T09:29:58Z
publishDate 2007-06-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-2c27c8ff963a4f1398a1929aab72594b2022-12-21T22:36:34ZengPublic Library of Science (PLoS)PLoS ONE1932-62032007-06-0126e57910.1371/journal.pone.0000579Indexing strategies for rapid searches of short words in genome sequences.Christian IseliGiovanna AmbrosiniPhilipp BucherC Victor JongeneelSearching for matches between large collections of short (14-30 nucleotides) words and sequence databases comprising full genomes or transcriptomes is a common task in biological sequence analysis. We investigated the performance of simple indexing strategies for handling such tasks and developed two programs, fetchGWI and tagger, that index either the database or the query set. Either strategy outperforms megablast for searches with more than 10,000 probes. FetchGWI is shown to be a versatile tool for rapidly searching multiple genomes, whose performance is limited in most cases by the speed of access to the filesystem. We have made publicly available a Web interface for searching the human, mouse, and several other genomes and transcriptomes with oligonucleotide queries.https://doi.org/10.1371/journal.pone.0000579
spellingShingle Christian Iseli
Giovanna Ambrosini
Philipp Bucher
C Victor Jongeneel
Indexing strategies for rapid searches of short words in genome sequences.
PLoS ONE
title Indexing strategies for rapid searches of short words in genome sequences.
title_full Indexing strategies for rapid searches of short words in genome sequences.
title_fullStr Indexing strategies for rapid searches of short words in genome sequences.
title_full_unstemmed Indexing strategies for rapid searches of short words in genome sequences.
title_short Indexing strategies for rapid searches of short words in genome sequences.
title_sort indexing strategies for rapid searches of short words in genome sequences
url https://doi.org/10.1371/journal.pone.0000579
work_keys_str_mv AT christianiseli indexingstrategiesforrapidsearchesofshortwordsingenomesequences
AT giovannaambrosini indexingstrategiesforrapidsearchesofshortwordsingenomesequences
AT philippbucher indexingstrategiesforrapidsearchesofshortwordsingenomesequences
AT cvictorjongeneel indexingstrategiesforrapidsearchesofshortwordsingenomesequences