Hierarchical Interleaved Bloom Filter: enabling ultrafast, approximate sequence queries

Abstract We present a novel data structure for searching sequences in large databases: the Hierarchical Interleaved Bloom Filter (HIBF). It is extremely fast and space efficient, yet so general that it could serve as the underlying engine for many applications. We show that the HIBF is superior in b...

Full description

Bibliographic Details
Main Authors: Svenja Mehringer, Enrico Seiler, Felix Droop, Mitra Darvish, René Rahn, Martin Vingron, Knut Reinert
Format: Article
Language:English
Published: BMC 2023-05-01
Series:Genome Biology
Subjects:
Online Access:https://doi.org/10.1186/s13059-023-02971-4
_version_ 1797811517108781056
author Svenja Mehringer
Enrico Seiler
Felix Droop
Mitra Darvish
René Rahn
Martin Vingron
Knut Reinert
author_facet Svenja Mehringer
Enrico Seiler
Felix Droop
Mitra Darvish
René Rahn
Martin Vingron
Knut Reinert
author_sort Svenja Mehringer
collection DOAJ
description Abstract We present a novel data structure for searching sequences in large databases: the Hierarchical Interleaved Bloom Filter (HIBF). It is extremely fast and space efficient, yet so general that it could serve as the underlying engine for many applications. We show that the HIBF is superior in build time, index size, and search time while achieving a comparable or better accuracy compared to other state-of-the-art tools. The HIBF builds an index up to 211 times faster, using up to 14 times less space, and can answer approximate membership queries faster by a factor of up to 129.
first_indexed 2024-03-13T07:23:50Z
format Article
id doaj.art-f6096bb2c23c40cc92f2bd277d69761a
institution Directory Open Access Journal
issn 1474-760X
language English
last_indexed 2024-03-13T07:23:50Z
publishDate 2023-05-01
publisher BMC
record_format Article
series Genome Biology
spelling doaj.art-f6096bb2c23c40cc92f2bd277d69761a2023-06-04T11:30:17ZengBMCGenome Biology1474-760X2023-05-0124112510.1186/s13059-023-02971-4Hierarchical Interleaved Bloom Filter: enabling ultrafast, approximate sequence queriesSvenja Mehringer0Enrico Seiler1Felix Droop2Mitra Darvish3René Rahn4Martin Vingron5Knut Reinert6Department of Mathematics and Computer Science, Freie Universität BerlinDepartment of Mathematics and Computer Science, Freie Universität BerlinDepartment of Mathematics and Computer Science, Freie Universität BerlinMPI for Molecular GeneticsMPI for Molecular GeneticsMPI for Molecular GeneticsDepartment of Mathematics and Computer Science, Freie Universität BerlinAbstract We present a novel data structure for searching sequences in large databases: the Hierarchical Interleaved Bloom Filter (HIBF). It is extremely fast and space efficient, yet so general that it could serve as the underlying engine for many applications. We show that the HIBF is superior in build time, index size, and search time while achieving a comparable or better accuracy compared to other state-of-the-art tools. The HIBF builds an index up to 211 times faster, using up to 14 times less space, and can answer approximate membership queries faster by a factor of up to 129.https://doi.org/10.1186/s13059-023-02971-4Approximate membership querySequence searchMiminizerAlignment free analysisBloom filterMetagenomics
spellingShingle Svenja Mehringer
Enrico Seiler
Felix Droop
Mitra Darvish
René Rahn
Martin Vingron
Knut Reinert
Hierarchical Interleaved Bloom Filter: enabling ultrafast, approximate sequence queries
Genome Biology
Approximate membership query
Sequence search
Miminizer
Alignment free analysis
Bloom filter
Metagenomics
title Hierarchical Interleaved Bloom Filter: enabling ultrafast, approximate sequence queries
title_full Hierarchical Interleaved Bloom Filter: enabling ultrafast, approximate sequence queries
title_fullStr Hierarchical Interleaved Bloom Filter: enabling ultrafast, approximate sequence queries
title_full_unstemmed Hierarchical Interleaved Bloom Filter: enabling ultrafast, approximate sequence queries
title_short Hierarchical Interleaved Bloom Filter: enabling ultrafast, approximate sequence queries
title_sort hierarchical interleaved bloom filter enabling ultrafast approximate sequence queries
topic Approximate membership query
Sequence search
Miminizer
Alignment free analysis
Bloom filter
Metagenomics
url https://doi.org/10.1186/s13059-023-02971-4
work_keys_str_mv AT svenjamehringer hierarchicalinterleavedbloomfilterenablingultrafastapproximatesequencequeries
AT enricoseiler hierarchicalinterleavedbloomfilterenablingultrafastapproximatesequencequeries
AT felixdroop hierarchicalinterleavedbloomfilterenablingultrafastapproximatesequencequeries
AT mitradarvish hierarchicalinterleavedbloomfilterenablingultrafastapproximatesequencequeries
AT renerahn hierarchicalinterleavedbloomfilterenablingultrafastapproximatesequencequeries
AT martinvingron hierarchicalinterleavedbloomfilterenablingultrafastapproximatesequencequeries
AT knutreinert hierarchicalinterleavedbloomfilterenablingultrafastapproximatesequencequeries