Enhancing Cloud Database Performance: General-Purpose Compression and Workload-Driven Layout

Cloud-based disaggregated database systems that divide data across a data layer and a storage layer connected by network calls are popular for analytical query loads. This thesis explores two topics critical to building performant systems of this type: space optimization and latency minimization....

Full description

Bibliographic Details
Main Author: Piszczek, Miloslawa
Other Authors: Kraska, Tim
Format: Thesis
Published: Massachusetts Institute of Technology 2024
Online Access:https://hdl.handle.net/1721.1/153856
_version_ 1826210920346419200
author Piszczek, Miloslawa
author2 Kraska, Tim
author_facet Kraska, Tim
Piszczek, Miloslawa
author_sort Piszczek, Miloslawa
collection MIT
description Cloud-based disaggregated database systems that divide data across a data layer and a storage layer connected by network calls are popular for analytical query loads. This thesis explores two topics critical to building performant systems of this type: space optimization and latency minimization. First, I propose ColumnConstruct- a general-purpose machine learning compression that uses a novel information-maximizing method for building input features. ColumnConstruct is competitive with existing ML compression methods for categorical data, but is not able to perform lossless compression on arbitrary tabular data. This limitation, as well as the additional compression and decompression latency, make it insufficient to improve query latency within a database management system. Next, I investigate whether workload-aware data layout combined with caching can improve query times without the need for ML-based compression or storage layer computation pushdown. I show that for small cache sizes and homogeneous query sets, a workload-aware layout combined with existing compression methods can be more effective than computation pushdown without reliance on particular features in the data storage layer.
first_indexed 2024-09-23T14:57:34Z
format Thesis
id mit-1721.1/153856
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T14:57:34Z
publishDate 2024
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1538562024-03-22T03:16:46Z Enhancing Cloud Database Performance: General-Purpose Compression and Workload-Driven Layout Piszczek, Miloslawa Kraska, Tim Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Cloud-based disaggregated database systems that divide data across a data layer and a storage layer connected by network calls are popular for analytical query loads. This thesis explores two topics critical to building performant systems of this type: space optimization and latency minimization. First, I propose ColumnConstruct- a general-purpose machine learning compression that uses a novel information-maximizing method for building input features. ColumnConstruct is competitive with existing ML compression methods for categorical data, but is not able to perform lossless compression on arbitrary tabular data. This limitation, as well as the additional compression and decompression latency, make it insufficient to improve query latency within a database management system. Next, I investigate whether workload-aware data layout combined with caching can improve query times without the need for ML-based compression or storage layer computation pushdown. I show that for small cache sizes and homogeneous query sets, a workload-aware layout combined with existing compression methods can be more effective than computation pushdown without reliance on particular features in the data storage layer. M.Eng. 2024-03-21T19:10:59Z 2024-03-21T19:10:59Z 2024-02 2024-03-04T16:38:10.997Z Thesis https://hdl.handle.net/1721.1/153856 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Piszczek, Miloslawa
Enhancing Cloud Database Performance: General-Purpose Compression and Workload-Driven Layout
title Enhancing Cloud Database Performance: General-Purpose Compression and Workload-Driven Layout
title_full Enhancing Cloud Database Performance: General-Purpose Compression and Workload-Driven Layout
title_fullStr Enhancing Cloud Database Performance: General-Purpose Compression and Workload-Driven Layout
title_full_unstemmed Enhancing Cloud Database Performance: General-Purpose Compression and Workload-Driven Layout
title_short Enhancing Cloud Database Performance: General-Purpose Compression and Workload-Driven Layout
title_sort enhancing cloud database performance general purpose compression and workload driven layout
url https://hdl.handle.net/1721.1/153856
work_keys_str_mv AT piszczekmiloslawa enhancingclouddatabaseperformancegeneralpurposecompressionandworkloaddrivenlayout