Indexing strategies for rapid searches of short words in genome sequences.
Searching for matches between large collections of short (14-30 nucleotides) words and sequence databases comprising full genomes or transcriptomes is a common task in biological sequence analysis. We investigated the performance of simple indexing strategies for handling such tasks and developed tw...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2007-06-01
|
Series: | PLoS ONE |
Online Access: | https://doi.org/10.1371/journal.pone.0000579 |
_version_ | 1818588766145609728 |
---|---|
author | Christian Iseli Giovanna Ambrosini Philipp Bucher C Victor Jongeneel |
author_facet | Christian Iseli Giovanna Ambrosini Philipp Bucher C Victor Jongeneel |
author_sort | Christian Iseli |
collection | DOAJ |
description | Searching for matches between large collections of short (14-30 nucleotides) words and sequence databases comprising full genomes or transcriptomes is a common task in biological sequence analysis. We investigated the performance of simple indexing strategies for handling such tasks and developed two programs, fetchGWI and tagger, that index either the database or the query set. Either strategy outperforms megablast for searches with more than 10,000 probes. FetchGWI is shown to be a versatile tool for rapidly searching multiple genomes, whose performance is limited in most cases by the speed of access to the filesystem. We have made publicly available a Web interface for searching the human, mouse, and several other genomes and transcriptomes with oligonucleotide queries. |
first_indexed | 2024-12-16T09:29:58Z |
format | Article |
id | doaj.art-2c27c8ff963a4f1398a1929aab72594b |
institution | Directory Open Access Journal |
issn | 1932-6203 |
language | English |
last_indexed | 2024-12-16T09:29:58Z |
publishDate | 2007-06-01 |
publisher | Public Library of Science (PLoS) |
record_format | Article |
series | PLoS ONE |
spelling | doaj.art-2c27c8ff963a4f1398a1929aab72594b2022-12-21T22:36:34ZengPublic Library of Science (PLoS)PLoS ONE1932-62032007-06-0126e57910.1371/journal.pone.0000579Indexing strategies for rapid searches of short words in genome sequences.Christian IseliGiovanna AmbrosiniPhilipp BucherC Victor JongeneelSearching for matches between large collections of short (14-30 nucleotides) words and sequence databases comprising full genomes or transcriptomes is a common task in biological sequence analysis. We investigated the performance of simple indexing strategies for handling such tasks and developed two programs, fetchGWI and tagger, that index either the database or the query set. Either strategy outperforms megablast for searches with more than 10,000 probes. FetchGWI is shown to be a versatile tool for rapidly searching multiple genomes, whose performance is limited in most cases by the speed of access to the filesystem. We have made publicly available a Web interface for searching the human, mouse, and several other genomes and transcriptomes with oligonucleotide queries.https://doi.org/10.1371/journal.pone.0000579 |
spellingShingle | Christian Iseli Giovanna Ambrosini Philipp Bucher C Victor Jongeneel Indexing strategies for rapid searches of short words in genome sequences. PLoS ONE |
title | Indexing strategies for rapid searches of short words in genome sequences. |
title_full | Indexing strategies for rapid searches of short words in genome sequences. |
title_fullStr | Indexing strategies for rapid searches of short words in genome sequences. |
title_full_unstemmed | Indexing strategies for rapid searches of short words in genome sequences. |
title_short | Indexing strategies for rapid searches of short words in genome sequences. |
title_sort | indexing strategies for rapid searches of short words in genome sequences |
url | https://doi.org/10.1371/journal.pone.0000579 |
work_keys_str_mv | AT christianiseli indexingstrategiesforrapidsearchesofshortwordsingenomesequences AT giovannaambrosini indexingstrategiesforrapidsearchesofshortwordsingenomesequences AT philippbucher indexingstrategiesforrapidsearchesofshortwordsingenomesequences AT cvictorjongeneel indexingstrategiesforrapidsearchesofshortwordsingenomesequences |