Rethinking Update-in-Place Key-Value Stores for Modern Storage
Several widely-used key-value stores, like RocksDB, are designed around log-structured merge trees (LSMs). Optimizing for the performance characteristics of HDDs, LSMs provide good write performance by emphasizing sequential access to storage. However, this approach negatively impacts read performan...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2022
|
Online Access: | https://hdl.handle.net/1721.1/144764 https://orcid.org/ 0000-0003-2851-8840 |
_version_ | 1811096995243229184 |
---|---|
author | Markakis, Markos |
author2 | Kraska, Tim |
author_facet | Kraska, Tim Markakis, Markos |
author_sort | Markakis, Markos |
collection | MIT |
description | Several widely-used key-value stores, like RocksDB, are designed around log-structured merge trees (LSMs). Optimizing for the performance characteristics of HDDs, LSMs provide good write performance by emphasizing sequential access to storage. However, this approach negatively impacts read performance: LSMs must employ expensive compaction jobs and memory-consuming Bloom filters in order to achieve reasonably fast reads. In the era of NVMe SSDs, we argue that this trade-off between read performance and write performance is sub-optimal. With enough parallelism, modern storage media have comparable random and sequential access performance, making update-in-place designs, which traditionally provide high read performance, a viable alternative to LSMs.
In this thesis, based on a research paper currently under submission, we close the gap between log-structured and update-in-place designs on modern SSDs by taking advantage of data and workload patterns. Specifically, we explore three key ideas: (A) record caching for efficient point operations, (B) page grouping for high-performance range scans, and (C) insert forecasting to reduce the reorganization costs of accommodating new records. We evaluate these ideas by implementing them in a prototype update-in-place key-value store called TreeLine. On YCSB, we find that TreeLine outperforms RocksDB and LeanStore by 2.18× and 2.05× respectively on average across the point workloads, and by up to 10.87× and 7.78× overall. |
first_indexed | 2024-09-23T16:52:40Z |
format | Thesis |
id | mit-1721.1/144764 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T16:52:40Z |
publishDate | 2022 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1447642022-08-30T03:02:10Z Rethinking Update-in-Place Key-Value Stores for Modern Storage Markakis, Markos Kraska, Tim Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Several widely-used key-value stores, like RocksDB, are designed around log-structured merge trees (LSMs). Optimizing for the performance characteristics of HDDs, LSMs provide good write performance by emphasizing sequential access to storage. However, this approach negatively impacts read performance: LSMs must employ expensive compaction jobs and memory-consuming Bloom filters in order to achieve reasonably fast reads. In the era of NVMe SSDs, we argue that this trade-off between read performance and write performance is sub-optimal. With enough parallelism, modern storage media have comparable random and sequential access performance, making update-in-place designs, which traditionally provide high read performance, a viable alternative to LSMs. In this thesis, based on a research paper currently under submission, we close the gap between log-structured and update-in-place designs on modern SSDs by taking advantage of data and workload patterns. Specifically, we explore three key ideas: (A) record caching for efficient point operations, (B) page grouping for high-performance range scans, and (C) insert forecasting to reduce the reorganization costs of accommodating new records. We evaluate these ideas by implementing them in a prototype update-in-place key-value store called TreeLine. On YCSB, we find that TreeLine outperforms RocksDB and LeanStore by 2.18× and 2.05× respectively on average across the point workloads, and by up to 10.87× and 7.78× overall. S.M. 2022-08-29T16:10:09Z 2022-08-29T16:10:09Z 2022-05 2022-06-21T19:25:38.185Z Thesis https://hdl.handle.net/1721.1/144764 https://orcid.org/ 0000-0003-2851-8840 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Markakis, Markos Rethinking Update-in-Place Key-Value Stores for Modern Storage |
title | Rethinking Update-in-Place Key-Value Stores for Modern
Storage |
title_full | Rethinking Update-in-Place Key-Value Stores for Modern
Storage |
title_fullStr | Rethinking Update-in-Place Key-Value Stores for Modern
Storage |
title_full_unstemmed | Rethinking Update-in-Place Key-Value Stores for Modern
Storage |
title_short | Rethinking Update-in-Place Key-Value Stores for Modern
Storage |
title_sort | rethinking update in place key value stores for modern storage |
url | https://hdl.handle.net/1721.1/144764 https://orcid.org/ 0000-0003-2851-8840 |
work_keys_str_mv | AT markakismarkos rethinkingupdateinplacekeyvaluestoresformodernstorage |