Improving lookup and query execution performance in distributed Big Data systems using Cuckoo Filter

Abstract Performance is a critical concern when reading and writing data from billions of records stored in a Big Data warehouse. We introduce two scopes for query performance improvement. One is to improve the performance of lookup queries after data deletion in Big Data systems that use Eventual C...

Full description

Bibliographic Details
Main Authors: Sharafat Ibn Mollah Mosharraf, Muhammad Abdullah Adnan
Format: Article
Language:English
Published: SpringerOpen 2022-01-01
Series:Journal of Big Data
Subjects:
Online Access:https://doi.org/10.1186/s40537-022-00563-w
_version_ 1818333161666379776
author Sharafat Ibn Mollah Mosharraf
Muhammad Abdullah Adnan
author_facet Sharafat Ibn Mollah Mosharraf
Muhammad Abdullah Adnan
author_sort Sharafat Ibn Mollah Mosharraf
collection DOAJ
description Abstract Performance is a critical concern when reading and writing data from billions of records stored in a Big Data warehouse. We introduce two scopes for query performance improvement. One is to improve the performance of lookup queries after data deletion in Big Data systems that use Eventual Consistency. We propose a scheme to improve lookup performance after data deletion by using Cuckoo Filter. Another scope for improvement is to avoid unnecessary network round-trips for querying in remote nodes in a distributed Big Data cluster when it is known that the nodes do not have requested partition of data. We propose a scheme using probabilistic filters that are looked up before querying remote nodes so that queries resulting in no data can be skipped from passing through the network. We evaluate our schemes with Cassandra using real dataset and show that each scheme can improve performance of lookup queries for up to 2x.
first_indexed 2024-12-13T13:47:14Z
format Article
id doaj.art-b04e72417c4246de9da1b82c13b81e09
institution Directory Open Access Journal
issn 2196-1115
language English
last_indexed 2024-12-13T13:47:14Z
publishDate 2022-01-01
publisher SpringerOpen
record_format Article
series Journal of Big Data
spelling doaj.art-b04e72417c4246de9da1b82c13b81e092022-12-21T23:43:21ZengSpringerOpenJournal of Big Data2196-11152022-01-019113010.1186/s40537-022-00563-wImproving lookup and query execution performance in distributed Big Data systems using Cuckoo FilterSharafat Ibn Mollah Mosharraf0Muhammad Abdullah Adnan1Department of Computer Science & Engineering, Bangladesh University of Engineering & Technology (BUET)Department of Computer Science & Engineering, Bangladesh University of Engineering & Technology (BUET)Abstract Performance is a critical concern when reading and writing data from billions of records stored in a Big Data warehouse. We introduce two scopes for query performance improvement. One is to improve the performance of lookup queries after data deletion in Big Data systems that use Eventual Consistency. We propose a scheme to improve lookup performance after data deletion by using Cuckoo Filter. Another scope for improvement is to avoid unnecessary network round-trips for querying in remote nodes in a distributed Big Data cluster when it is known that the nodes do not have requested partition of data. We propose a scheme using probabilistic filters that are looked up before querying remote nodes so that queries resulting in no data can be skipped from passing through the network. We evaluate our schemes with Cassandra using real dataset and show that each scheme can improve performance of lookup queries for up to 2x.https://doi.org/10.1186/s40537-022-00563-wBig DataDistributed systemsQuery optimizationProbabilistic data structureBloom filterCuckoo Filter
spellingShingle Sharafat Ibn Mollah Mosharraf
Muhammad Abdullah Adnan
Improving lookup and query execution performance in distributed Big Data systems using Cuckoo Filter
Journal of Big Data
Big Data
Distributed systems
Query optimization
Probabilistic data structure
Bloom filter
Cuckoo Filter
title Improving lookup and query execution performance in distributed Big Data systems using Cuckoo Filter
title_full Improving lookup and query execution performance in distributed Big Data systems using Cuckoo Filter
title_fullStr Improving lookup and query execution performance in distributed Big Data systems using Cuckoo Filter
title_full_unstemmed Improving lookup and query execution performance in distributed Big Data systems using Cuckoo Filter
title_short Improving lookup and query execution performance in distributed Big Data systems using Cuckoo Filter
title_sort improving lookup and query execution performance in distributed big data systems using cuckoo filter
topic Big Data
Distributed systems
Query optimization
Probabilistic data structure
Bloom filter
Cuckoo Filter
url https://doi.org/10.1186/s40537-022-00563-w
work_keys_str_mv AT sharafatibnmollahmosharraf improvinglookupandqueryexecutionperformanceindistributedbigdatasystemsusingcuckoofilter
AT muhammadabdullahadnan improvinglookupandqueryexecutionperformanceindistributedbigdatasystemsusingcuckoofilter