Configurable Online Multi-Tiered Storage in a Database Management System
Businesses of today produce data items on the order of millions on a daily basis. This is especially true in cloud environments, where much of this data comes in the form of logs and metrics about the performance and status of components in their cloud configurations. Maintaining efficient data stor...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2023
|
Online Access: | https://hdl.handle.net/1721.1/151677 |
_version_ | 1826203682328281088 |
---|---|
author | DaCosta, Howard |
author2 | Curtis, Dorothy |
author_facet | Curtis, Dorothy DaCosta, Howard |
author_sort | DaCosta, Howard |
collection | MIT |
description | Businesses of today produce data items on the order of millions on a daily basis. This is especially true in cloud environments, where much of this data comes in the form of logs and metrics about the performance and status of components in their cloud configurations. Maintaining efficient data storage and retrieval along with growing customer data capacity is very challenging. One reason for this is that newer data tends to be accessed more frequently, while older data needs to be archived for future analysis. Another reason is that maintaining large amounts of data in fast storage disks is very costly. One approach to this problem is a tiered storage system, where new data is allocated to faster storage tiers and older data is pushed to lower tiers with slower retrieval time. This thesis presents a fully online and configurable design and implementation for this in a database management system (DBMS) [1, 2], which has been difficult in the past due to two key constraints: the immutability of its columns and its lack of atomicity for sub-partition level operations. Without atomicity, there are no mechanisms in place that guarantee that a tenant’s data within a partition is moved or deleted completely, which can cause undetermined states that are difficult to identify and resolve. With the immutability of columns, data must be copied and inserted into other tiers, which raises a problem of duplicate data across tiers when a tenant is issuing queries. While these constraints are the exact optimizations that make this particular DBMS so performant for large analytical uses, they are the key features that need to be redesigned in building this system. The proof of concept developed here satisfies all of these requirements with an ingestion rate of 1 TB per day, minimal overhead, and about 70% in projected savings per instance — which could amount to hundreds of thousands of dollars saved per month in large production installations. |
first_indexed | 2024-09-23T12:41:23Z |
format | Thesis |
id | mit-1721.1/151677 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T12:41:23Z |
publishDate | 2023 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1516772023-08-01T03:44:09Z Configurable Online Multi-Tiered Storage in a Database Management System DaCosta, Howard Curtis, Dorothy Ryabin, Aleks Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Businesses of today produce data items on the order of millions on a daily basis. This is especially true in cloud environments, where much of this data comes in the form of logs and metrics about the performance and status of components in their cloud configurations. Maintaining efficient data storage and retrieval along with growing customer data capacity is very challenging. One reason for this is that newer data tends to be accessed more frequently, while older data needs to be archived for future analysis. Another reason is that maintaining large amounts of data in fast storage disks is very costly. One approach to this problem is a tiered storage system, where new data is allocated to faster storage tiers and older data is pushed to lower tiers with slower retrieval time. This thesis presents a fully online and configurable design and implementation for this in a database management system (DBMS) [1, 2], which has been difficult in the past due to two key constraints: the immutability of its columns and its lack of atomicity for sub-partition level operations. Without atomicity, there are no mechanisms in place that guarantee that a tenant’s data within a partition is moved or deleted completely, which can cause undetermined states that are difficult to identify and resolve. With the immutability of columns, data must be copied and inserted into other tiers, which raises a problem of duplicate data across tiers when a tenant is issuing queries. While these constraints are the exact optimizations that make this particular DBMS so performant for large analytical uses, they are the key features that need to be redesigned in building this system. The proof of concept developed here satisfies all of these requirements with an ingestion rate of 1 TB per day, minimal overhead, and about 70% in projected savings per instance — which could amount to hundreds of thousands of dollars saved per month in large production installations. M.Eng. 2023-07-31T19:58:16Z 2023-07-31T19:58:16Z 2023-06 2023-06-06T16:35:53.025Z Thesis https://hdl.handle.net/1721.1/151677 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | DaCosta, Howard Configurable Online Multi-Tiered Storage in a Database Management System |
title | Configurable Online Multi-Tiered Storage in
a Database Management System |
title_full | Configurable Online Multi-Tiered Storage in
a Database Management System |
title_fullStr | Configurable Online Multi-Tiered Storage in
a Database Management System |
title_full_unstemmed | Configurable Online Multi-Tiered Storage in
a Database Management System |
title_short | Configurable Online Multi-Tiered Storage in
a Database Management System |
title_sort | configurable online multi tiered storage in a database management system |
url | https://hdl.handle.net/1721.1/151677 |
work_keys_str_mv | AT dacostahoward configurableonlinemultitieredstorageinadatabasemanagementsystem |