Improving the Performance of MapReduce for Small-Scale Cloud Processes Using a Dynamic Task Adjustment Mechanism

The MapReduce architecture can reliably distribute massive datasets to cloud worker nodes for processing. When each worker node processes the input data, the Map program generates intermediate data that are used by the Reduce program for integration. However, as the worker nodes process the MapReduc...

Full description

Bibliographic Details
Main Authors: Tzu-Chi Huang, Guo-Hao Huang, Ming-Fong Tsai
Format: Article
Language:English
Published: MDPI AG 2022-05-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/10/10/1736
Description
Summary:The MapReduce architecture can reliably distribute massive datasets to cloud worker nodes for processing. When each worker node processes the input data, the Map program generates intermediate data that are used by the Reduce program for integration. However, as the worker nodes process the MapReduce tasks, there are differences in the number of intermediate data created, due to variation in the operating-system environments and the input data, which results in the phenomenon of laggard nodes and affects the completion time for each small-scale cloud application task. In this paper, we propose a dynamic task adjustment mechanism for an intermediate-data processing cycle prediction algorithm, with the aim of improving the execution performance of small-scale cloud applications. Our mechanism dynamically adjusts the number of Map and Reduce program tasks based on the intermediate-data processing capabilities of each cloud worker node, in order to mitigate the problem of performance degradation caused by the limitations on the Google Cloud platform (Hadoop cluster) due to the phenomenon of laggards. The proposed dynamic task adjustment mechanism was compared with a simulated Hadoop system in a performance analysis, and an improvement of at least 5% in the processing efficiency was found for a small-scale cloud application.
ISSN:2227-7390