Dynamic Job Ordering and Slot Configurations for MapReduce Workloads

MapReduce is a popular parallel computing paradigm for large-scale data processing in clusters and data centers. A MapReduce workload generally contains a set of jobs, each of which consists of multiple map tasks followed by multiple reduce tasks. Due to 1) that map tasks can only run in map slots a...

詳細記述

書誌詳細
主要な著者: Tang, Shanjiang, Lee, Bu-Sung, He, Bingsheng
その他の著者: School of Computer Engineering
フォーマット: Journal Article
言語:English
出版事項: 2016
主題:
オンライン・アクセス:https://hdl.handle.net/10356/80385
http://hdl.handle.net/10220/40666
その他の書誌記述
要約:MapReduce is a popular parallel computing paradigm for large-scale data processing in clusters and data centers. A MapReduce workload generally contains a set of jobs, each of which consists of multiple map tasks followed by multiple reduce tasks. Due to 1) that map tasks can only run in map slots and reduce tasks can only run in reduce slots, and 2) the general execution constraints that map tasks are executed before reduce tasks, different job execution orders and map/reduce slot configurations for a MapReduce workload have significantly different performance and system utilization. This paper proposes two classes of algorithms to minimize the makespan and the total completion time for an offline MapReduce workload. Our first class of algorithms focuses on the job ordering optimization for a MapReduce workload under a given map/reduce slot configuration. In contrast, our second class of algorithms considers the scenario that we can perform optimization for map/reduce slot configuration for a MapReduce workload. We perform simulations as well as experiments on Amazon EC2 and show that our proposed algorithms produce results that are up to 15 ~ 80 percent better than currently unoptimized Hadoop, leading to significant reductions in running time in practice.