Adaptive scheduling in Spark

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.

Bibliographic Details
Main Author: Mahajan, Rohan
Other Authors: Matei Zaharia.
Format: Thesis
Language:eng
Published: Massachusetts Institute of Technology 2016
Subjects:
Online Access:http://hdl.handle.net/1721.1/105977
_version_ 1826207370681778176
author Mahajan, Rohan
author2 Matei Zaharia.
author_facet Matei Zaharia.
Mahajan, Rohan
author_sort Mahajan, Rohan
collection MIT
description Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.
first_indexed 2024-09-23T13:48:36Z
format Thesis
id mit-1721.1/105977
institution Massachusetts Institute of Technology
language eng
last_indexed 2024-09-23T13:48:36Z
publishDate 2016
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1059772019-04-11T07:21:53Z Adaptive scheduling in Spark Mahajan, Rohan Matei Zaharia. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (page 33). Because most data processing systems are distributed in nature, data must be transferred between machines. Currently, Spark, a prominent such system, predetermines the strategies for shuffling this data, but in certain situations, different shuffle strategies would improve performance. We add functionality to track metrics about the data during the job and appropriately adapt the shuffle strategy. We show improvements in ShuffledRDD performance, joins using Spark's RDD interface, and joins in Spark SQL. by Rohan Mahajan. M. Eng. 2016-12-22T15:17:11Z 2016-12-22T15:17:11Z 2016 2016 Thesis http://hdl.handle.net/1721.1/105977 965643791 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 33 pages application/pdf Massachusetts Institute of Technology
spellingShingle Electrical Engineering and Computer Science.
Mahajan, Rohan
Adaptive scheduling in Spark
title Adaptive scheduling in Spark
title_full Adaptive scheduling in Spark
title_fullStr Adaptive scheduling in Spark
title_full_unstemmed Adaptive scheduling in Spark
title_short Adaptive scheduling in Spark
title_sort adaptive scheduling in spark
topic Electrical Engineering and Computer Science.
url http://hdl.handle.net/1721.1/105977
work_keys_str_mv AT mahajanrohan adaptiveschedulinginspark