iiHadoop: an asynchronous distributed framework for incremental iterative computations

Abstract It is true that data is never static; it keeps growing and changing over time. New data is added and old data can either be modified or deleted. This incremental nature of data motivates the development of new systems to perform large-scale data computations incrementally. MapReduce was rec...

Full description

Bibliographic Details
Main Authors:	Afaf G. Bin Saadon, Hoda M. O. Mokhtar
Format:	Article
Language:	English
Published:	SpringerOpen 2017-07-01
Series:	Journal of Big Data
Subjects:	Big data Distributed systems Hadoop framework Iterative processing Incremental computation
Online Access:	http://link.springer.com/article/10.1186/s40537-017-0086-3

_version_	1811265201387864064
author	Afaf G. Bin Saadon Hoda M. O. Mokhtar
author_facet	Afaf G. Bin Saadon Hoda M. O. Mokhtar
author_sort	Afaf G. Bin Saadon
collection	DOAJ
description	Abstract It is true that data is never static; it keeps growing and changing over time. New data is added and old data can either be modified or deleted. This incremental nature of data motivates the development of new systems to perform large-scale data computations incrementally. MapReduce was recently introduced to provide an efficient approach for handling large-scale data computations. Nevertheless, it turned to be inefficient in supporting the processing of small incremental data. While many previous systems have extended MapReduce to perform iterative or incremental computations, these systems are still inefficient and too expensive to perform large-scale iterative computations on changing data. In this paper, we present a new system called iiHadoop, an extension of Hadoop framework, optimized for incremental iterative computations. iiHadoop accelerates program execution by performing the incremental computations on the small fraction of data that is affected by changes rather than the whole data. In addition, iiHadoop improves the performance by executing iterations asynchronously, and employing locality-aware scheduling for the map and reduce tasks taking into account the incremental and iterative behavior. An evaluation for the proposed iiHadoop framework is presented using examples of iterative algorithms, and the results showed significant performance improvements over comparable existing frameworks.
first_indexed	2024-04-12T20:18:13Z
format	Article
id	doaj.art-8329430b4eb744d98995ca2f38b7e690
institution	Directory Open Access Journal
issn	2196-1115
language	English
last_indexed	2024-04-12T20:18:13Z
publishDate	2017-07-01
publisher	SpringerOpen
record_format	Article
series	Journal of Big Data
spelling	doaj.art-8329430b4eb744d98995ca2f38b7e6902022-12-22T03:18:03ZengSpringerOpenJournal of Big Data2196-11152017-07-014113010.1186/s40537-017-0086-3iiHadoop: an asynchronous distributed framework for incremental iterative computationsAfaf G. Bin Saadon0Hoda M. O. Mokhtar1Faculty of Computers and Information, Cairo UniversityFaculty of Computers and Information, Cairo UniversityAbstract It is true that data is never static; it keeps growing and changing over time. New data is added and old data can either be modified or deleted. This incremental nature of data motivates the development of new systems to perform large-scale data computations incrementally. MapReduce was recently introduced to provide an efficient approach for handling large-scale data computations. Nevertheless, it turned to be inefficient in supporting the processing of small incremental data. While many previous systems have extended MapReduce to perform iterative or incremental computations, these systems are still inefficient and too expensive to perform large-scale iterative computations on changing data. In this paper, we present a new system called iiHadoop, an extension of Hadoop framework, optimized for incremental iterative computations. iiHadoop accelerates program execution by performing the incremental computations on the small fraction of data that is affected by changes rather than the whole data. In addition, iiHadoop improves the performance by executing iterations asynchronously, and employing locality-aware scheduling for the map and reduce tasks taking into account the incremental and iterative behavior. An evaluation for the proposed iiHadoop framework is presented using examples of iterative algorithms, and the results showed significant performance improvements over comparable existing frameworks.http://link.springer.com/article/10.1186/s40537-017-0086-3Big dataDistributed systemsHadoop frameworkIterative processingIncremental computation
spellingShingle	Afaf G. Bin Saadon Hoda M. O. Mokhtar iiHadoop: an asynchronous distributed framework for incremental iterative computations Journal of Big Data Big data Distributed systems Hadoop framework Iterative processing Incremental computation
title	iiHadoop: an asynchronous distributed framework for incremental iterative computations
title_full	iiHadoop: an asynchronous distributed framework for incremental iterative computations
title_fullStr	iiHadoop: an asynchronous distributed framework for incremental iterative computations
title_full_unstemmed	iiHadoop: an asynchronous distributed framework for incremental iterative computations
title_short	iiHadoop: an asynchronous distributed framework for incremental iterative computations
title_sort	iihadoop an asynchronous distributed framework for incremental iterative computations
topic	Big data Distributed systems Hadoop framework Iterative processing Incremental computation
url	http://link.springer.com/article/10.1186/s40537-017-0086-3
work_keys_str_mv	AT afafgbinsaadon iihadoopanasynchronousdistributedframeworkforincrementaliterativecomputations AT hodamomokhtar iihadoopanasynchronousdistributedframeworkforincrementaliterativecomputations

iiHadoop: an asynchronous distributed framework for incremental iterative computations

Similar Items