Learning Bit Allocations for Z-Order Layouts in Analytic Data Systems

aiDM ’24, June 14, 2024, Santiago, AA, Chile

Bibliographic Details
Main Authors: Gao, Jenny, Ding, Jialin, Sudhir, Sivaprasad, Madden, Samuel
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format: Article
Language:English
Published: ACM 2024
Online Access:https://hdl.handle.net/1721.1/155538
_version_ 1826193382224953344
author Gao, Jenny
Ding, Jialin
Sudhir, Sivaprasad
Madden, Samuel
author2 Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Gao, Jenny
Ding, Jialin
Sudhir, Sivaprasad
Madden, Samuel
author_sort Gao, Jenny
collection MIT
description aiDM ’24, June 14, 2024, Santiago, AA, Chile
first_indexed 2024-09-23T09:38:09Z
format Article
id mit-1721.1/155538
institution Massachusetts Institute of Technology
language English
last_indexed 2025-02-19T04:18:15Z
publishDate 2024
publisher ACM
record_format dspace
spelling mit-1721.1/1555382024-12-23T05:33:41Z Learning Bit Allocations for Z-Order Layouts in Analytic Data Systems Gao, Jenny Ding, Jialin Sudhir, Sivaprasad Madden, Samuel Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory aiDM ’24, June 14, 2024, Santiago, AA, Chile To improve the performance of scanning and filtering, modern analytic data systems such as Amazon Redshift and Databricks Delta Lake give users the ability to sort a table using a Z-order, which maps each row to a "Z-value" by interleaving the binary representations of the row's attributes, then sorts rows by their Z-values. These Z-order layouts essentially sort the table by multiple columns simultaneously and can achieve superior performance to single-column sort orders when the user's queries filter over multiple columns. However, the user shoulders the burden of manually selecting the columns to include in the Z-order, and a poor choice of columns can significantly degrade performance. Furthermore, these systems treat all columns included in the Z-order as equally important, which often does not result in the best performance due to the unequal impact that different columns have on query performance. In this work, we investigate the performance impact of using Z-orders that place unequal importance on columns: instead of using an equal number of bits from each column in the Z-value interleaving, we allow unequal bit allocation. We introduce a technique that uses Bayesian optimization to automatically learn the best bit allocation for a Z-order layout on a given dataset and query workload. Z-order layouts using our learned bit allocations outperform equal-bit Z-orders by up to 1.6× in query runtime and up to 2× in rows scanned. 2024-07-09T16:30:32Z 2024-07-09T16:30:32Z 2024-06-09 2024-07-01T08:00:58Z Article http://purl.org/eprint/type/ConferencePaper 979-8-4007-0680-6 https://hdl.handle.net/1721.1/155538 Gao, Jenny, Ding, Jialin, Sudhir, Sivaprasad and Madden, Samuel. 2024. "Learning Bit Allocations for Z-Order Layouts in Analytic Data Systems." PUBLISHER_CC en 10.1145/3663742.3663975 Creative Commons Attribution https://creativecommons.org/licenses/by/4.0/ The author(s) application/pdf ACM Association for Computing Machinery
spellingShingle Gao, Jenny
Ding, Jialin
Sudhir, Sivaprasad
Madden, Samuel
Learning Bit Allocations for Z-Order Layouts in Analytic Data Systems
title Learning Bit Allocations for Z-Order Layouts in Analytic Data Systems
title_full Learning Bit Allocations for Z-Order Layouts in Analytic Data Systems
title_fullStr Learning Bit Allocations for Z-Order Layouts in Analytic Data Systems
title_full_unstemmed Learning Bit Allocations for Z-Order Layouts in Analytic Data Systems
title_short Learning Bit Allocations for Z-Order Layouts in Analytic Data Systems
title_sort learning bit allocations for z order layouts in analytic data systems
url https://hdl.handle.net/1721.1/155538
work_keys_str_mv AT gaojenny learningbitallocationsforzorderlayoutsinanalyticdatasystems
AT dingjialin learningbitallocationsforzorderlayoutsinanalyticdatasystems
AT sudhirsivaprasad learningbitallocationsforzorderlayoutsinanalyticdatasystems
AT maddensamuel learningbitallocationsforzorderlayoutsinanalyticdatasystems