MRPack: Multi-Algorithm Execution Using Compute-Intensive Approach in MapReduce.

Large quantities of data have been generated from multiple sources at exponential rates in the last few years. These data are generated at high velocity as real time and streaming data in variety of formats. These characteristics give rise to challenges in its modeling, computation, and processing....

Full description

Bibliographic Details
Main Authors:	Muhammad Idris, Shujaat Hussain, Muhammad Hameed Siddiqi, Waseem Hassan, Hafiz Syed Muhammad Bilal, Sungyoung Lee
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2015-01-01
Series:	PLoS ONE
Online Access:	http://europepmc.org/articles/PMC4549337?pdf=render

_version_	1818200164286857216
author	Muhammad Idris Shujaat Hussain Muhammad Hameed Siddiqi Waseem Hassan Hafiz Syed Muhammad Bilal Sungyoung Lee
author_facet	Muhammad Idris Shujaat Hussain Muhammad Hameed Siddiqi Waseem Hassan Hafiz Syed Muhammad Bilal Sungyoung Lee
author_sort	Muhammad Idris
collection	DOAJ
description	Large quantities of data have been generated from multiple sources at exponential rates in the last few years. These data are generated at high velocity as real time and streaming data in variety of formats. These characteristics give rise to challenges in its modeling, computation, and processing. Hadoop MapReduce (MR) is a well known data-intensive distributed processing framework using the distributed file system (DFS) for Big Data. Current implementations of MR only support execution of a single algorithm in the entire Hadoop cluster. In this paper, we propose MapReducePack (MRPack), a variation of MR that supports execution of a set of related algorithms in a single MR job. We exploit the computational capability of a cluster by increasing the compute-intensiveness of MapReduce while maintaining its data-intensive approach. It uses the available computing resources by dynamically managing the task assignment and intermediate data. Intermediate data from multiple algorithms are managed using multi-key and skew mitigation strategies. The performance study of the proposed system shows that it is time, I/O, and memory efficient compared to the default MapReduce. The proposed approach reduces the execution time by 200% with an approximate 50% decrease in I/O cost. Complexity and qualitative results analysis shows significant performance improvement.
first_indexed	2024-12-12T02:33:18Z
format	Article
id	doaj.art-c112b161579d4effadc483c14de4ab04
institution	Directory Open Access Journal
issn	1932-6203
language	English
last_indexed	2024-12-12T02:33:18Z
publishDate	2015-01-01
publisher	Public Library of Science (PLoS)
record_format	Article
series	PLoS ONE
spelling	doaj.art-c112b161579d4effadc483c14de4ab042022-12-22T00:41:21ZengPublic Library of Science (PLoS)PLoS ONE1932-62032015-01-01108e013625910.1371/journal.pone.0136259MRPack: Multi-Algorithm Execution Using Compute-Intensive Approach in MapReduce.Muhammad IdrisShujaat HussainMuhammad Hameed SiddiqiWaseem HassanHafiz Syed Muhammad BilalSungyoung LeeLarge quantities of data have been generated from multiple sources at exponential rates in the last few years. These data are generated at high velocity as real time and streaming data in variety of formats. These characteristics give rise to challenges in its modeling, computation, and processing. Hadoop MapReduce (MR) is a well known data-intensive distributed processing framework using the distributed file system (DFS) for Big Data. Current implementations of MR only support execution of a single algorithm in the entire Hadoop cluster. In this paper, we propose MapReducePack (MRPack), a variation of MR that supports execution of a set of related algorithms in a single MR job. We exploit the computational capability of a cluster by increasing the compute-intensiveness of MapReduce while maintaining its data-intensive approach. It uses the available computing resources by dynamically managing the task assignment and intermediate data. Intermediate data from multiple algorithms are managed using multi-key and skew mitigation strategies. The performance study of the proposed system shows that it is time, I/O, and memory efficient compared to the default MapReduce. The proposed approach reduces the execution time by 200% with an approximate 50% decrease in I/O cost. Complexity and qualitative results analysis shows significant performance improvement.http://europepmc.org/articles/PMC4549337?pdf=render
spellingShingle	Muhammad Idris Shujaat Hussain Muhammad Hameed Siddiqi Waseem Hassan Hafiz Syed Muhammad Bilal Sungyoung Lee MRPack: Multi-Algorithm Execution Using Compute-Intensive Approach in MapReduce. PLoS ONE
title	MRPack: Multi-Algorithm Execution Using Compute-Intensive Approach in MapReduce.
title_full	MRPack: Multi-Algorithm Execution Using Compute-Intensive Approach in MapReduce.
title_fullStr	MRPack: Multi-Algorithm Execution Using Compute-Intensive Approach in MapReduce.
title_full_unstemmed	MRPack: Multi-Algorithm Execution Using Compute-Intensive Approach in MapReduce.
title_short	MRPack: Multi-Algorithm Execution Using Compute-Intensive Approach in MapReduce.
title_sort	mrpack multi algorithm execution using compute intensive approach in mapreduce
url	http://europepmc.org/articles/PMC4549337?pdf=render
work_keys_str_mv	AT muhammadidris mrpackmultialgorithmexecutionusingcomputeintensiveapproachinmapreduce AT shujaathussain mrpackmultialgorithmexecutionusingcomputeintensiveapproachinmapreduce AT muhammadhameedsiddiqi mrpackmultialgorithmexecutionusingcomputeintensiveapproachinmapreduce AT waseemhassan mrpackmultialgorithmexecutionusingcomputeintensiveapproachinmapreduce AT hafizsyedmuhammadbilal mrpackmultialgorithmexecutionusingcomputeintensiveapproachinmapreduce AT sungyounglee mrpackmultialgorithmexecutionusingcomputeintensiveapproachinmapreduce

MRPack: Multi-Algorithm Execution Using Compute-Intensive Approach in MapReduce.

Similar Items