Rethinking Update-in-Place Key-Value Stores for Modern Storage

Several widely-used key-value stores, like RocksDB, are designed around log-structured merge trees (LSMs). Optimizing for the performance characteristics of HDDs, LSMs provide good write performance by emphasizing sequential access to storage. However, this approach negatively impacts read performan...

Full description

Bibliographic Details
Main Author: Markakis, Markos
Other Authors: Kraska, Tim
Format: Thesis
Published: Massachusetts Institute of Technology 2022
Online Access:https://hdl.handle.net/1721.1/144764
https://orcid.org/ 0000-0003-2851-8840
_version_ 1811096995243229184
author Markakis, Markos
author2 Kraska, Tim
author_facet Kraska, Tim
Markakis, Markos
author_sort Markakis, Markos
collection MIT
description Several widely-used key-value stores, like RocksDB, are designed around log-structured merge trees (LSMs). Optimizing for the performance characteristics of HDDs, LSMs provide good write performance by emphasizing sequential access to storage. However, this approach negatively impacts read performance: LSMs must employ expensive compaction jobs and memory-consuming Bloom filters in order to achieve reasonably fast reads. In the era of NVMe SSDs, we argue that this trade-off between read performance and write performance is sub-optimal. With enough parallelism, modern storage media have comparable random and sequential access performance, making update-in-place designs, which traditionally provide high read performance, a viable alternative to LSMs. In this thesis, based on a research paper currently under submission, we close the gap between log-structured and update-in-place designs on modern SSDs by taking advantage of data and workload patterns. Specifically, we explore three key ideas: (A) record caching for efficient point operations, (B) page grouping for high-performance range scans, and (C) insert forecasting to reduce the reorganization costs of accommodating new records. We evaluate these ideas by implementing them in a prototype update-in-place key-value store called TreeLine. On YCSB, we find that TreeLine outperforms RocksDB and LeanStore by 2.18× and 2.05× respectively on average across the point workloads, and by up to 10.87× and 7.78× overall.
first_indexed 2024-09-23T16:52:40Z
format Thesis
id mit-1721.1/144764
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T16:52:40Z
publishDate 2022
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1447642022-08-30T03:02:10Z Rethinking Update-in-Place Key-Value Stores for Modern Storage Markakis, Markos Kraska, Tim Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Several widely-used key-value stores, like RocksDB, are designed around log-structured merge trees (LSMs). Optimizing for the performance characteristics of HDDs, LSMs provide good write performance by emphasizing sequential access to storage. However, this approach negatively impacts read performance: LSMs must employ expensive compaction jobs and memory-consuming Bloom filters in order to achieve reasonably fast reads. In the era of NVMe SSDs, we argue that this trade-off between read performance and write performance is sub-optimal. With enough parallelism, modern storage media have comparable random and sequential access performance, making update-in-place designs, which traditionally provide high read performance, a viable alternative to LSMs. In this thesis, based on a research paper currently under submission, we close the gap between log-structured and update-in-place designs on modern SSDs by taking advantage of data and workload patterns. Specifically, we explore three key ideas: (A) record caching for efficient point operations, (B) page grouping for high-performance range scans, and (C) insert forecasting to reduce the reorganization costs of accommodating new records. We evaluate these ideas by implementing them in a prototype update-in-place key-value store called TreeLine. On YCSB, we find that TreeLine outperforms RocksDB and LeanStore by 2.18× and 2.05× respectively on average across the point workloads, and by up to 10.87× and 7.78× overall. S.M. 2022-08-29T16:10:09Z 2022-08-29T16:10:09Z 2022-05 2022-06-21T19:25:38.185Z Thesis https://hdl.handle.net/1721.1/144764 https://orcid.org/ 0000-0003-2851-8840 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Markakis, Markos
Rethinking Update-in-Place Key-Value Stores for Modern Storage
title Rethinking Update-in-Place Key-Value Stores for Modern Storage
title_full Rethinking Update-in-Place Key-Value Stores for Modern Storage
title_fullStr Rethinking Update-in-Place Key-Value Stores for Modern Storage
title_full_unstemmed Rethinking Update-in-Place Key-Value Stores for Modern Storage
title_short Rethinking Update-in-Place Key-Value Stores for Modern Storage
title_sort rethinking update in place key value stores for modern storage
url https://hdl.handle.net/1721.1/144764
https://orcid.org/ 0000-0003-2851-8840
work_keys_str_mv AT markakismarkos rethinkingupdateinplacekeyvaluestoresformodernstorage