Tsunami: a learned multi-dimensional index for correlated data and skewed workloads

© 2020, VLDB Endowment. All rights reserved. Filtering data based on predicates is one of the most fundamental operations for any modern data warehouse. Techniques to accelerate the execution of filter expressions include clustered indexes, specialized sort orders (e.g., Z-order), multi-dimensional...

Full description

Bibliographic Details
Main Authors: Ding, Jialin, Nathan, Vikram, Alizadeh, Mohammad, Kraska, Tim
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format: Article
Language:English
Published: VLDB Endowment 2021
Online Access:https://hdl.handle.net/1721.1/132295
_version_ 1826193833835102208
author Ding, Jialin
Nathan, Vikram
Alizadeh, Mohammad
Kraska, Tim
author2 Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Ding, Jialin
Nathan, Vikram
Alizadeh, Mohammad
Kraska, Tim
author_sort Ding, Jialin
collection MIT
description © 2020, VLDB Endowment. All rights reserved. Filtering data based on predicates is one of the most fundamental operations for any modern data warehouse. Techniques to accelerate the execution of filter expressions include clustered indexes, specialized sort orders (e.g., Z-order), multi-dimensional indexes, and, for high selectivity queries, secondary indexes. However, these schemes are hard to tune and their performance is inconsistent. Recent work on learned multi-dimensional indexes has introduced the idea of automatically optimizing an index for a particular dataset and workload. However, the performance of that work suffers in the presence of correlated data and skewed query workloads, both of which are common in real applications. In this paper, we introduce Tsunami, which addresses these limitations to achieve up to 6× faster query performance and up to 8× smaller index size than existing learned multi-dimensional indexes, in addition to up to 11× faster query performance and 170× smaller index size than optimally-tuned traditional indexes.
first_indexed 2024-09-23T09:45:49Z
format Article
id mit-1721.1/132295
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-23T09:45:49Z
publishDate 2021
publisher VLDB Endowment
record_format dspace
spelling mit-1721.1/1322952023-09-26T20:03:07Z Tsunami: a learned multi-dimensional index for correlated data and skewed workloads Ding, Jialin Nathan, Vikram Alizadeh, Mohammad Kraska, Tim Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science © 2020, VLDB Endowment. All rights reserved. Filtering data based on predicates is one of the most fundamental operations for any modern data warehouse. Techniques to accelerate the execution of filter expressions include clustered indexes, specialized sort orders (e.g., Z-order), multi-dimensional indexes, and, for high selectivity queries, secondary indexes. However, these schemes are hard to tune and their performance is inconsistent. Recent work on learned multi-dimensional indexes has introduced the idea of automatically optimizing an index for a particular dataset and workload. However, the performance of that work suffers in the presence of correlated data and skewed query workloads, both of which are common in real applications. In this paper, we introduce Tsunami, which addresses these limitations to achieve up to 6× faster query performance and up to 8× smaller index size than existing learned multi-dimensional indexes, in addition to up to 11× faster query performance and 170× smaller index size than optimally-tuned traditional indexes. 2021-09-20T18:21:43Z 2021-09-20T18:21:43Z 2020 2021-01-11T18:24:45Z Article http://purl.org/eprint/type/ConferencePaper https://hdl.handle.net/1721.1/132295 en 10.14778/3425879.3425880 Proceedings of the VLDB Endowment Creative Commons Attribution-NonCommercial-NoDerivs License http://creativecommons.org/licenses/by-nc-nd/4.0/ application/pdf VLDB Endowment VLDB Endowment
spellingShingle Ding, Jialin
Nathan, Vikram
Alizadeh, Mohammad
Kraska, Tim
Tsunami: a learned multi-dimensional index for correlated data and skewed workloads
title Tsunami: a learned multi-dimensional index for correlated data and skewed workloads
title_full Tsunami: a learned multi-dimensional index for correlated data and skewed workloads
title_fullStr Tsunami: a learned multi-dimensional index for correlated data and skewed workloads
title_full_unstemmed Tsunami: a learned multi-dimensional index for correlated data and skewed workloads
title_short Tsunami: a learned multi-dimensional index for correlated data and skewed workloads
title_sort tsunami a learned multi dimensional index for correlated data and skewed workloads
url https://hdl.handle.net/1721.1/132295
work_keys_str_mv AT dingjialin tsunamialearnedmultidimensionalindexforcorrelateddataandskewedworkloads
AT nathanvikram tsunamialearnedmultidimensionalindexforcorrelateddataandskewedworkloads
AT alizadehmohammad tsunamialearnedmultidimensionalindexforcorrelateddataandskewedworkloads
AT kraskatim tsunamialearnedmultidimensionalindexforcorrelateddataandskewedworkloads