Enhancing Cloud Database Performance: General-Purpose Compression and Workload-Driven Layout
Cloud-based disaggregated database systems that divide data across a data layer and a storage layer connected by network calls are popular for analytical query loads. This thesis explores two topics critical to building performant systems of this type: space optimization and latency minimization....
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2024
|
Online Access: | https://hdl.handle.net/1721.1/153856 |
_version_ | 1826210920346419200 |
---|---|
author | Piszczek, Miloslawa |
author2 | Kraska, Tim |
author_facet | Kraska, Tim Piszczek, Miloslawa |
author_sort | Piszczek, Miloslawa |
collection | MIT |
description | Cloud-based disaggregated database systems that divide data across a data layer and a storage layer connected by network calls are popular for analytical query loads. This thesis explores two topics critical to building performant systems of this type: space optimization and latency minimization.
First, I propose ColumnConstruct- a general-purpose machine learning compression that uses a novel information-maximizing method for building input features. ColumnConstruct is competitive with existing ML compression methods for categorical data, but is not able to perform lossless compression on arbitrary tabular data. This limitation, as well as the additional compression and decompression latency, make it insufficient to improve query latency within a database management system. Next, I investigate whether workload-aware data layout combined with caching can improve query times without the need for ML-based compression or storage layer computation pushdown. I show that for small cache sizes and homogeneous query sets, a workload-aware layout combined with existing compression methods can be more effective than computation pushdown without reliance on particular features in the data storage layer. |
first_indexed | 2024-09-23T14:57:34Z |
format | Thesis |
id | mit-1721.1/153856 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T14:57:34Z |
publishDate | 2024 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1538562024-03-22T03:16:46Z Enhancing Cloud Database Performance: General-Purpose Compression and Workload-Driven Layout Piszczek, Miloslawa Kraska, Tim Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Cloud-based disaggregated database systems that divide data across a data layer and a storage layer connected by network calls are popular for analytical query loads. This thesis explores two topics critical to building performant systems of this type: space optimization and latency minimization. First, I propose ColumnConstruct- a general-purpose machine learning compression that uses a novel information-maximizing method for building input features. ColumnConstruct is competitive with existing ML compression methods for categorical data, but is not able to perform lossless compression on arbitrary tabular data. This limitation, as well as the additional compression and decompression latency, make it insufficient to improve query latency within a database management system. Next, I investigate whether workload-aware data layout combined with caching can improve query times without the need for ML-based compression or storage layer computation pushdown. I show that for small cache sizes and homogeneous query sets, a workload-aware layout combined with existing compression methods can be more effective than computation pushdown without reliance on particular features in the data storage layer. M.Eng. 2024-03-21T19:10:59Z 2024-03-21T19:10:59Z 2024-02 2024-03-04T16:38:10.997Z Thesis https://hdl.handle.net/1721.1/153856 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Piszczek, Miloslawa Enhancing Cloud Database Performance: General-Purpose Compression and Workload-Driven Layout |
title | Enhancing Cloud Database Performance: General-Purpose Compression and Workload-Driven Layout |
title_full | Enhancing Cloud Database Performance: General-Purpose Compression and Workload-Driven Layout |
title_fullStr | Enhancing Cloud Database Performance: General-Purpose Compression and Workload-Driven Layout |
title_full_unstemmed | Enhancing Cloud Database Performance: General-Purpose Compression and Workload-Driven Layout |
title_short | Enhancing Cloud Database Performance: General-Purpose Compression and Workload-Driven Layout |
title_sort | enhancing cloud database performance general purpose compression and workload driven layout |
url | https://hdl.handle.net/1721.1/153856 |
work_keys_str_mv | AT piszczekmiloslawa enhancingclouddatabaseperformancegeneralpurposecompressionandworkloaddrivenlayout |