MapReduce and its applications in heterogeneous environment

As the data growth rate outpace that of the processing capabilities of CPUs, reaching Petascale, technologies and tools that can effectively process such huge datasets become increasingly important. Two major approaches are currently adopted to address this issue: use of specialized hardware acceler...

Full description

Bibliographic Details
Main Author:	Tan, Yu Shyang
Other Authors:	Lee Bu Sung, Francis
Format:	Thesis
Language:	English
Published:	2011
Subjects:	DRNTU::Engineering::Computer science and engineering::Computer systems organization::Computer system implementation
Online Access:	https://hdl.handle.net/10356/46718

_version_	1811692537838043136
author	Tan, Yu Shyang
author2	Lee Bu Sung, Francis
author_facet	Lee Bu Sung, Francis Tan, Yu Shyang
author_sort	Tan, Yu Shyang
collection	NTU
description	As the data growth rate outpace that of the processing capabilities of CPUs, reaching Petascale, technologies and tools that can effectively process such huge datasets become increasingly important. Two major approaches are currently adopted to address this issue: use of specialized hardware accelerators such as GPGPU and developing new data intensive processing tools. In the case of the former, the trend shows an increasing number of GPGPU clusters being used in high performance computing. In the latter, Google introduced a framework coupled programming model called MapReduce for massive distributed parallel processing. In this thesis, I investigated the possibility of leveraging on these two technologies, so as to create an environment where users can harness the potentials of hardware accelerators in processing huge datasets, in a distributed and parallel manner. Hadoop, an open source implementation of MapReduce is first analysed. This initial study looks into the performance of Hadoop when processing small datasets, something which Hadoop is not designed for. The study uses several metrics such as the input file size, the size of dataset and locality of data and looked into some of the parameters that can affect performance of the MapReduce flow with respect to the dataset. The study provided an insight to MapReduce and how data can be decomposed into sub data partitions so that the data can be managed by the accelerators while having minimal negative impact on the performance.
first_indexed	2024-10-01T06:37:22Z
format	Thesis
id	ntu-10356/46718
institution	Nanyang Technological University
language	English
last_indexed	2024-10-01T06:37:22Z
publishDate	2011
record_format	dspace
spelling	ntu-10356/467182020-03-20T18:51:19Z MapReduce and its applications in heterogeneous environment Tan, Yu Shyang Lee Bu Sung, Francis School of Computer Engineering Parallel and Distributed Computing Centre DRNTU::Engineering::Computer science and engineering::Computer systems organization::Computer system implementation As the data growth rate outpace that of the processing capabilities of CPUs, reaching Petascale, technologies and tools that can effectively process such huge datasets become increasingly important. Two major approaches are currently adopted to address this issue: use of specialized hardware accelerators such as GPGPU and developing new data intensive processing tools. In the case of the former, the trend shows an increasing number of GPGPU clusters being used in high performance computing. In the latter, Google introduced a framework coupled programming model called MapReduce for massive distributed parallel processing. In this thesis, I investigated the possibility of leveraging on these two technologies, so as to create an environment where users can harness the potentials of hardware accelerators in processing huge datasets, in a distributed and parallel manner. Hadoop, an open source implementation of MapReduce is first analysed. This initial study looks into the performance of Hadoop when processing small datasets, something which Hadoop is not designed for. The study uses several metrics such as the input file size, the size of dataset and locality of data and looked into some of the parameters that can affect performance of the MapReduce flow with respect to the dataset. The study provided an insight to MapReduce and how data can be decomposed into sub data partitions so that the data can be managed by the accelerators while having minimal negative impact on the performance. MASTER OF ENGINEERING (SCE) 2011-12-23T06:59:54Z 2011-12-23T06:59:54Z 2011 2011 Thesis Tan, Y. S. (2011). MapReduce and its applications in heterogeneous environment. Master’s thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/46718 10.32657/10356/46718 en 87 p. application/msword
spellingShingle	DRNTU::Engineering::Computer science and engineering::Computer systems organization::Computer system implementation Tan, Yu Shyang MapReduce and its applications in heterogeneous environment
title	MapReduce and its applications in heterogeneous environment
title_full	MapReduce and its applications in heterogeneous environment
title_fullStr	MapReduce and its applications in heterogeneous environment
title_full_unstemmed	MapReduce and its applications in heterogeneous environment
title_short	MapReduce and its applications in heterogeneous environment
title_sort	mapreduce and its applications in heterogeneous environment
topic	DRNTU::Engineering::Computer science and engineering::Computer systems organization::Computer system implementation
url	https://hdl.handle.net/10356/46718
work_keys_str_mv	AT tanyushyang mapreduceanditsapplicationsinheterogeneousenvironment

MapReduce and its applications in heterogeneous environment

Similar Items