An adaptive partitioning scheme for ad-hoc and time-varying database analytics

Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.

Bibliographic Details
Main Author: Shanbhag, Anil (Anil Atmanand)
Other Authors: Samuel Madden.
Format: Thesis
Language:eng
Published: Massachusetts Institute of Technology 2016
Subjects:
Online Access:http://hdl.handle.net/1721.1/105961
_version_ 1826188307876282368
author Shanbhag, Anil (Anil Atmanand)
author2 Samuel Madden.
author_facet Samuel Madden.
Shanbhag, Anil (Anil Atmanand)
author_sort Shanbhag, Anil (Anil Atmanand)
collection MIT
description Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.
first_indexed 2024-09-23T07:57:40Z
format Thesis
id mit-1721.1/105961
institution Massachusetts Institute of Technology
language eng
last_indexed 2024-09-23T07:57:40Z
publishDate 2016
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1059612020-12-07T17:59:21Z An adaptive partitioning scheme for ad-hoc and time-varying database analytics Shanbhag, Anil (Anil Atmanand) Samuel Madden. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Electrical Engineering and Computer Science. Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 57-59). Data partitioning significantly improves query performance in distributed database systems. A large number of techniques have been proposed to efficiently partition a dataset, often focusing on finding the best partitioning for a particular query workload. However, many modern analytic applications involve ad-hoc or exploratory analysis where users do not have a representative query workload. Furthermore, workloads change over time as businesses evolve or as analysts gain better understanding of their data. Static workload-based data partitioning techniques are therefore not suitable for such settings. In this thesis, we present Amoeba, an adaptive distributed storage system for data skipping. It does not require an upfront query workload and adapts the data partitioning according to the queries posed by users over time. We present the data structures, partitioning algorithms, and an efficient implementation on top of Apache Spark and HDFS. Our experimental results show that the Amoeba storage system provides improved query performance for ad-hoc workloads, adapts to changes in the query workloads, and converges to a steady state in case of recurring workloads. On a real world workload, Amoeba reduces the total workload runtime by 1.8x compared to Spark with data partitioned and 3.4x compared to unmodified Spark. by Anil Shanbhag. S.M. 2016-12-22T15:16:34Z 2016-12-22T15:16:34Z 2016 2016 Thesis http://hdl.handle.net/1721.1/105961 965549381 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 62 pages application/pdf Massachusetts Institute of Technology
spellingShingle Electrical Engineering and Computer Science.
Shanbhag, Anil (Anil Atmanand)
An adaptive partitioning scheme for ad-hoc and time-varying database analytics
title An adaptive partitioning scheme for ad-hoc and time-varying database analytics
title_full An adaptive partitioning scheme for ad-hoc and time-varying database analytics
title_fullStr An adaptive partitioning scheme for ad-hoc and time-varying database analytics
title_full_unstemmed An adaptive partitioning scheme for ad-hoc and time-varying database analytics
title_short An adaptive partitioning scheme for ad-hoc and time-varying database analytics
title_sort adaptive partitioning scheme for ad hoc and time varying database analytics
topic Electrical Engineering and Computer Science.
url http://hdl.handle.net/1721.1/105961
work_keys_str_mv AT shanbhaganilanilatmanand anadaptivepartitioningschemeforadhocandtimevaryingdatabaseanalytics
AT shanbhaganilanilatmanand adaptivepartitioningschemeforadhocandtimevaryingdatabaseanalytics