Configurable Online Multi-Tiered Storage in a Database Management System

Businesses of today produce data items on the order of millions on a daily basis. This is especially true in cloud environments, where much of this data comes in the form of logs and metrics about the performance and status of components in their cloud configurations. Maintaining efficient data stor...

Full description

Bibliographic Details
Main Author: DaCosta, Howard
Other Authors: Curtis, Dorothy
Format: Thesis
Published: Massachusetts Institute of Technology 2023
Online Access:https://hdl.handle.net/1721.1/151677
_version_ 1826203682328281088
author DaCosta, Howard
author2 Curtis, Dorothy
author_facet Curtis, Dorothy
DaCosta, Howard
author_sort DaCosta, Howard
collection MIT
description Businesses of today produce data items on the order of millions on a daily basis. This is especially true in cloud environments, where much of this data comes in the form of logs and metrics about the performance and status of components in their cloud configurations. Maintaining efficient data storage and retrieval along with growing customer data capacity is very challenging. One reason for this is that newer data tends to be accessed more frequently, while older data needs to be archived for future analysis. Another reason is that maintaining large amounts of data in fast storage disks is very costly. One approach to this problem is a tiered storage system, where new data is allocated to faster storage tiers and older data is pushed to lower tiers with slower retrieval time. This thesis presents a fully online and configurable design and implementation for this in a database management system (DBMS) [1, 2], which has been difficult in the past due to two key constraints: the immutability of its columns and its lack of atomicity for sub-partition level operations. Without atomicity, there are no mechanisms in place that guarantee that a tenant’s data within a partition is moved or deleted completely, which can cause undetermined states that are difficult to identify and resolve. With the immutability of columns, data must be copied and inserted into other tiers, which raises a problem of duplicate data across tiers when a tenant is issuing queries. While these constraints are the exact optimizations that make this particular DBMS so performant for large analytical uses, they are the key features that need to be redesigned in building this system. The proof of concept developed here satisfies all of these requirements with an ingestion rate of 1 TB per day, minimal overhead, and about 70% in projected savings per instance — which could amount to hundreds of thousands of dollars saved per month in large production installations.
first_indexed 2024-09-23T12:41:23Z
format Thesis
id mit-1721.1/151677
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T12:41:23Z
publishDate 2023
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1516772023-08-01T03:44:09Z Configurable Online Multi-Tiered Storage in a Database Management System DaCosta, Howard Curtis, Dorothy Ryabin, Aleks Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Businesses of today produce data items on the order of millions on a daily basis. This is especially true in cloud environments, where much of this data comes in the form of logs and metrics about the performance and status of components in their cloud configurations. Maintaining efficient data storage and retrieval along with growing customer data capacity is very challenging. One reason for this is that newer data tends to be accessed more frequently, while older data needs to be archived for future analysis. Another reason is that maintaining large amounts of data in fast storage disks is very costly. One approach to this problem is a tiered storage system, where new data is allocated to faster storage tiers and older data is pushed to lower tiers with slower retrieval time. This thesis presents a fully online and configurable design and implementation for this in a database management system (DBMS) [1, 2], which has been difficult in the past due to two key constraints: the immutability of its columns and its lack of atomicity for sub-partition level operations. Without atomicity, there are no mechanisms in place that guarantee that a tenant’s data within a partition is moved or deleted completely, which can cause undetermined states that are difficult to identify and resolve. With the immutability of columns, data must be copied and inserted into other tiers, which raises a problem of duplicate data across tiers when a tenant is issuing queries. While these constraints are the exact optimizations that make this particular DBMS so performant for large analytical uses, they are the key features that need to be redesigned in building this system. The proof of concept developed here satisfies all of these requirements with an ingestion rate of 1 TB per day, minimal overhead, and about 70% in projected savings per instance — which could amount to hundreds of thousands of dollars saved per month in large production installations. M.Eng. 2023-07-31T19:58:16Z 2023-07-31T19:58:16Z 2023-06 2023-06-06T16:35:53.025Z Thesis https://hdl.handle.net/1721.1/151677 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle DaCosta, Howard
Configurable Online Multi-Tiered Storage in a Database Management System
title Configurable Online Multi-Tiered Storage in a Database Management System
title_full Configurable Online Multi-Tiered Storage in a Database Management System
title_fullStr Configurable Online Multi-Tiered Storage in a Database Management System
title_full_unstemmed Configurable Online Multi-Tiered Storage in a Database Management System
title_short Configurable Online Multi-Tiered Storage in a Database Management System
title_sort configurable online multi tiered storage in a database management system
url https://hdl.handle.net/1721.1/151677
work_keys_str_mv AT dacostahoward configurableonlinemultitieredstorageinadatabasemanagementsystem