Hierarchical Interleaved Bloom Filter: enabling ultrafast, approximate sequence queries
Abstract We present a novel data structure for searching sequences in large databases: the Hierarchical Interleaved Bloom Filter (HIBF). It is extremely fast and space efficient, yet so general that it could serve as the underlying engine for many applications. We show that the HIBF is superior in b...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2023-05-01
|
Series: | Genome Biology |
Subjects: | |
Online Access: | https://doi.org/10.1186/s13059-023-02971-4 |
_version_ | 1797811517108781056 |
---|---|
author | Svenja Mehringer Enrico Seiler Felix Droop Mitra Darvish René Rahn Martin Vingron Knut Reinert |
author_facet | Svenja Mehringer Enrico Seiler Felix Droop Mitra Darvish René Rahn Martin Vingron Knut Reinert |
author_sort | Svenja Mehringer |
collection | DOAJ |
description | Abstract We present a novel data structure for searching sequences in large databases: the Hierarchical Interleaved Bloom Filter (HIBF). It is extremely fast and space efficient, yet so general that it could serve as the underlying engine for many applications. We show that the HIBF is superior in build time, index size, and search time while achieving a comparable or better accuracy compared to other state-of-the-art tools. The HIBF builds an index up to 211 times faster, using up to 14 times less space, and can answer approximate membership queries faster by a factor of up to 129. |
first_indexed | 2024-03-13T07:23:50Z |
format | Article |
id | doaj.art-f6096bb2c23c40cc92f2bd277d69761a |
institution | Directory Open Access Journal |
issn | 1474-760X |
language | English |
last_indexed | 2024-03-13T07:23:50Z |
publishDate | 2023-05-01 |
publisher | BMC |
record_format | Article |
series | Genome Biology |
spelling | doaj.art-f6096bb2c23c40cc92f2bd277d69761a2023-06-04T11:30:17ZengBMCGenome Biology1474-760X2023-05-0124112510.1186/s13059-023-02971-4Hierarchical Interleaved Bloom Filter: enabling ultrafast, approximate sequence queriesSvenja Mehringer0Enrico Seiler1Felix Droop2Mitra Darvish3René Rahn4Martin Vingron5Knut Reinert6Department of Mathematics and Computer Science, Freie Universität BerlinDepartment of Mathematics and Computer Science, Freie Universität BerlinDepartment of Mathematics and Computer Science, Freie Universität BerlinMPI for Molecular GeneticsMPI for Molecular GeneticsMPI for Molecular GeneticsDepartment of Mathematics and Computer Science, Freie Universität BerlinAbstract We present a novel data structure for searching sequences in large databases: the Hierarchical Interleaved Bloom Filter (HIBF). It is extremely fast and space efficient, yet so general that it could serve as the underlying engine for many applications. We show that the HIBF is superior in build time, index size, and search time while achieving a comparable or better accuracy compared to other state-of-the-art tools. The HIBF builds an index up to 211 times faster, using up to 14 times less space, and can answer approximate membership queries faster by a factor of up to 129.https://doi.org/10.1186/s13059-023-02971-4Approximate membership querySequence searchMiminizerAlignment free analysisBloom filterMetagenomics |
spellingShingle | Svenja Mehringer Enrico Seiler Felix Droop Mitra Darvish René Rahn Martin Vingron Knut Reinert Hierarchical Interleaved Bloom Filter: enabling ultrafast, approximate sequence queries Genome Biology Approximate membership query Sequence search Miminizer Alignment free analysis Bloom filter Metagenomics |
title | Hierarchical Interleaved Bloom Filter: enabling ultrafast, approximate sequence queries |
title_full | Hierarchical Interleaved Bloom Filter: enabling ultrafast, approximate sequence queries |
title_fullStr | Hierarchical Interleaved Bloom Filter: enabling ultrafast, approximate sequence queries |
title_full_unstemmed | Hierarchical Interleaved Bloom Filter: enabling ultrafast, approximate sequence queries |
title_short | Hierarchical Interleaved Bloom Filter: enabling ultrafast, approximate sequence queries |
title_sort | hierarchical interleaved bloom filter enabling ultrafast approximate sequence queries |
topic | Approximate membership query Sequence search Miminizer Alignment free analysis Bloom filter Metagenomics |
url | https://doi.org/10.1186/s13059-023-02971-4 |
work_keys_str_mv | AT svenjamehringer hierarchicalinterleavedbloomfilterenablingultrafastapproximatesequencequeries AT enricoseiler hierarchicalinterleavedbloomfilterenablingultrafastapproximatesequencequeries AT felixdroop hierarchicalinterleavedbloomfilterenablingultrafastapproximatesequencequeries AT mitradarvish hierarchicalinterleavedbloomfilterenablingultrafastapproximatesequencequeries AT renerahn hierarchicalinterleavedbloomfilterenablingultrafastapproximatesequencequeries AT martinvingron hierarchicalinterleavedbloomfilterenablingultrafastapproximatesequencequeries AT knutreinert hierarchicalinterleavedbloomfilterenablingultrafastapproximatesequencequeries |