Adaptive scheduling in Spark
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Language: | eng |
Published: |
Massachusetts Institute of Technology
2016
|
Subjects: | |
Online Access: | http://hdl.handle.net/1721.1/105977 |
_version_ | 1826207370681778176 |
---|---|
author | Mahajan, Rohan |
author2 | Matei Zaharia. |
author_facet | Matei Zaharia. Mahajan, Rohan |
author_sort | Mahajan, Rohan |
collection | MIT |
description | Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. |
first_indexed | 2024-09-23T13:48:36Z |
format | Thesis |
id | mit-1721.1/105977 |
institution | Massachusetts Institute of Technology |
language | eng |
last_indexed | 2024-09-23T13:48:36Z |
publishDate | 2016 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1059772019-04-11T07:21:53Z Adaptive scheduling in Spark Mahajan, Rohan Matei Zaharia. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (page 33). Because most data processing systems are distributed in nature, data must be transferred between machines. Currently, Spark, a prominent such system, predetermines the strategies for shuffling this data, but in certain situations, different shuffle strategies would improve performance. We add functionality to track metrics about the data during the job and appropriately adapt the shuffle strategy. We show improvements in ShuffledRDD performance, joins using Spark's RDD interface, and joins in Spark SQL. by Rohan Mahajan. M. Eng. 2016-12-22T15:17:11Z 2016-12-22T15:17:11Z 2016 2016 Thesis http://hdl.handle.net/1721.1/105977 965643791 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 33 pages application/pdf Massachusetts Institute of Technology |
spellingShingle | Electrical Engineering and Computer Science. Mahajan, Rohan Adaptive scheduling in Spark |
title | Adaptive scheduling in Spark |
title_full | Adaptive scheduling in Spark |
title_fullStr | Adaptive scheduling in Spark |
title_full_unstemmed | Adaptive scheduling in Spark |
title_short | Adaptive scheduling in Spark |
title_sort | adaptive scheduling in spark |
topic | Electrical Engineering and Computer Science. |
url | http://hdl.handle.net/1721.1/105977 |
work_keys_str_mv | AT mahajanrohan adaptiveschedulinginspark |