Choosing a cloud DBMS: architectures and tradeoffs

© 2019 VLDB Endowment. As analytic (OLAP) applications move to the cloud, DBMSs have shifted from employing a pure shared-nothing design with locally attached storage to a hybrid design that combines the use of shared-storage (e.g., AWS S3) with the use of shared-nothing query execution mechanisms....

Volledige beschrijving

Bibliografische gegevens
Hoofdauteurs: Tan, Junjay, Ghanem, Thanaa, Perron, Matthew, Yu, Xiangyao, Stonebraker, Michael, DeWitt, David, Serafini, Marco, Aboulnaga, Ashraf, Kraska, Tim
Formaat: Artikel
Taal:English
Gepubliceerd in: VLDB Endowment 2021
Online toegang:https://hdl.handle.net/1721.1/132276
_version_ 1826195407017869312
author Tan, Junjay
Ghanem, Thanaa
Perron, Matthew
Yu, Xiangyao
Stonebraker, Michael
DeWitt, David
Serafini, Marco
Aboulnaga, Ashraf
Kraska, Tim
author_facet Tan, Junjay
Ghanem, Thanaa
Perron, Matthew
Yu, Xiangyao
Stonebraker, Michael
DeWitt, David
Serafini, Marco
Aboulnaga, Ashraf
Kraska, Tim
author_sort Tan, Junjay
collection MIT
description © 2019 VLDB Endowment. As analytic (OLAP) applications move to the cloud, DBMSs have shifted from employing a pure shared-nothing design with locally attached storage to a hybrid design that combines the use of shared-storage (e.g., AWS S3) with the use of shared-nothing query execution mechanisms. This paper sheds light on the resulting tradeoffs, which have not been properly identified in previous work. To this end, it evaluates the TPC-H benchmark across a variety of DBMS offerings running in a cloud environment (AWS) on fast 10Gb+ networks, specifically database-as-a-service offerings (Redshift, Athena), query engines (Presto, Hive), and a traditional cloud agnostic OLAP database (Vertica). While these comparisons cannot be apples-to-apples in all cases due to cloud configuration restrictions, we nonetheless identify patterns and design choices that are advantageous. These include prioritizing low-cost object stores like S3 for data storage, using system agnostic yet still performant columnar formats like ORC that allow easy switching to other systems for different workloads, and making features that benefit subsequent runs like query precompilation and caching remote data to faster storage optional rather than required because they disadvantage ad hoc queries.
first_indexed 2024-09-23T10:12:10Z
format Article
id mit-1721.1/132276
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-23T10:12:10Z
publishDate 2021
publisher VLDB Endowment
record_format dspace
spelling mit-1721.1/1322762021-09-21T03:51:53Z Choosing a cloud DBMS: architectures and tradeoffs Tan, Junjay Ghanem, Thanaa Perron, Matthew Yu, Xiangyao Stonebraker, Michael DeWitt, David Serafini, Marco Aboulnaga, Ashraf Kraska, Tim © 2019 VLDB Endowment. As analytic (OLAP) applications move to the cloud, DBMSs have shifted from employing a pure shared-nothing design with locally attached storage to a hybrid design that combines the use of shared-storage (e.g., AWS S3) with the use of shared-nothing query execution mechanisms. This paper sheds light on the resulting tradeoffs, which have not been properly identified in previous work. To this end, it evaluates the TPC-H benchmark across a variety of DBMS offerings running in a cloud environment (AWS) on fast 10Gb+ networks, specifically database-as-a-service offerings (Redshift, Athena), query engines (Presto, Hive), and a traditional cloud agnostic OLAP database (Vertica). While these comparisons cannot be apples-to-apples in all cases due to cloud configuration restrictions, we nonetheless identify patterns and design choices that are advantageous. These include prioritizing low-cost object stores like S3 for data storage, using system agnostic yet still performant columnar formats like ORC that allow easy switching to other systems for different workloads, and making features that benefit subsequent runs like query precompilation and caching remote data to faster storage optional rather than required because they disadvantage ad hoc queries. 2021-09-20T18:21:37Z 2021-09-20T18:21:37Z 2021-01-11T15:10:36Z Article http://purl.org/eprint/type/ConferencePaper https://hdl.handle.net/1721.1/132276 en 10.14778/3352063.3352133 Proceedings of the VLDB Endowment Creative Commons Attribution-NonCommercial-NoDerivs License http://creativecommons.org/licenses/by-nc-nd/4.0/ application/pdf VLDB Endowment VLDB Endowment
spellingShingle Tan, Junjay
Ghanem, Thanaa
Perron, Matthew
Yu, Xiangyao
Stonebraker, Michael
DeWitt, David
Serafini, Marco
Aboulnaga, Ashraf
Kraska, Tim
Choosing a cloud DBMS: architectures and tradeoffs
title Choosing a cloud DBMS: architectures and tradeoffs
title_full Choosing a cloud DBMS: architectures and tradeoffs
title_fullStr Choosing a cloud DBMS: architectures and tradeoffs
title_full_unstemmed Choosing a cloud DBMS: architectures and tradeoffs
title_short Choosing a cloud DBMS: architectures and tradeoffs
title_sort choosing a cloud dbms architectures and tradeoffs
url https://hdl.handle.net/1721.1/132276
work_keys_str_mv AT tanjunjay choosingaclouddbmsarchitecturesandtradeoffs
AT ghanemthanaa choosingaclouddbmsarchitecturesandtradeoffs
AT perronmatthew choosingaclouddbmsarchitecturesandtradeoffs
AT yuxiangyao choosingaclouddbmsarchitecturesandtradeoffs
AT stonebrakermichael choosingaclouddbmsarchitecturesandtradeoffs
AT dewittdavid choosingaclouddbmsarchitecturesandtradeoffs
AT serafinimarco choosingaclouddbmsarchitecturesandtradeoffs
AT aboulnagaashraf choosingaclouddbmsarchitecturesandtradeoffs
AT kraskatim choosingaclouddbmsarchitecturesandtradeoffs