iiHadoop: an asynchronous distributed framework for incremental iterative computations

Abstract It is true that data is never static; it keeps growing and changing over time. New data is added and old data can either be modified or deleted. This incremental nature of data motivates the development of new systems to perform large-scale data computations incrementally. MapReduce was rec...

Full description

Bibliographic Details
Main Authors: Afaf G. Bin Saadon, Hoda M. O. Mokhtar
Format: Article
Language:English
Published: SpringerOpen 2017-07-01
Series:Journal of Big Data
Subjects:
Online Access:http://link.springer.com/article/10.1186/s40537-017-0086-3
_version_ 1811265201387864064
author Afaf G. Bin Saadon
Hoda M. O. Mokhtar
author_facet Afaf G. Bin Saadon
Hoda M. O. Mokhtar
author_sort Afaf G. Bin Saadon
collection DOAJ
description Abstract It is true that data is never static; it keeps growing and changing over time. New data is added and old data can either be modified or deleted. This incremental nature of data motivates the development of new systems to perform large-scale data computations incrementally. MapReduce was recently introduced to provide an efficient approach for handling large-scale data computations. Nevertheless, it turned to be inefficient in supporting the processing of small incremental data. While many previous systems have extended MapReduce to perform iterative or incremental computations, these systems are still inefficient and too expensive to perform large-scale iterative computations on changing data. In this paper, we present a new system called iiHadoop, an extension of Hadoop framework, optimized for incremental iterative computations. iiHadoop accelerates program execution by performing the incremental computations on the small fraction of data that is affected by changes rather than the whole data. In addition, iiHadoop improves the performance by executing iterations asynchronously, and employing locality-aware scheduling for the map and reduce tasks taking into account the incremental and iterative behavior. An evaluation for the proposed iiHadoop framework is presented using examples of iterative algorithms, and the results showed significant performance improvements over comparable existing frameworks.
first_indexed 2024-04-12T20:18:13Z
format Article
id doaj.art-8329430b4eb744d98995ca2f38b7e690
institution Directory Open Access Journal
issn 2196-1115
language English
last_indexed 2024-04-12T20:18:13Z
publishDate 2017-07-01
publisher SpringerOpen
record_format Article
series Journal of Big Data
spelling doaj.art-8329430b4eb744d98995ca2f38b7e6902022-12-22T03:18:03ZengSpringerOpenJournal of Big Data2196-11152017-07-014113010.1186/s40537-017-0086-3iiHadoop: an asynchronous distributed framework for incremental iterative computationsAfaf G. Bin Saadon0Hoda M. O. Mokhtar1Faculty of Computers and Information, Cairo UniversityFaculty of Computers and Information, Cairo UniversityAbstract It is true that data is never static; it keeps growing and changing over time. New data is added and old data can either be modified or deleted. This incremental nature of data motivates the development of new systems to perform large-scale data computations incrementally. MapReduce was recently introduced to provide an efficient approach for handling large-scale data computations. Nevertheless, it turned to be inefficient in supporting the processing of small incremental data. While many previous systems have extended MapReduce to perform iterative or incremental computations, these systems are still inefficient and too expensive to perform large-scale iterative computations on changing data. In this paper, we present a new system called iiHadoop, an extension of Hadoop framework, optimized for incremental iterative computations. iiHadoop accelerates program execution by performing the incremental computations on the small fraction of data that is affected by changes rather than the whole data. In addition, iiHadoop improves the performance by executing iterations asynchronously, and employing locality-aware scheduling for the map and reduce tasks taking into account the incremental and iterative behavior. An evaluation for the proposed iiHadoop framework is presented using examples of iterative algorithms, and the results showed significant performance improvements over comparable existing frameworks.http://link.springer.com/article/10.1186/s40537-017-0086-3Big dataDistributed systemsHadoop frameworkIterative processingIncremental computation
spellingShingle Afaf G. Bin Saadon
Hoda M. O. Mokhtar
iiHadoop: an asynchronous distributed framework for incremental iterative computations
Journal of Big Data
Big data
Distributed systems
Hadoop framework
Iterative processing
Incremental computation
title iiHadoop: an asynchronous distributed framework for incremental iterative computations
title_full iiHadoop: an asynchronous distributed framework for incremental iterative computations
title_fullStr iiHadoop: an asynchronous distributed framework for incremental iterative computations
title_full_unstemmed iiHadoop: an asynchronous distributed framework for incremental iterative computations
title_short iiHadoop: an asynchronous distributed framework for incremental iterative computations
title_sort iihadoop an asynchronous distributed framework for incremental iterative computations
topic Big data
Distributed systems
Hadoop framework
Iterative processing
Incremental computation
url http://link.springer.com/article/10.1186/s40537-017-0086-3
work_keys_str_mv AT afafgbinsaadon iihadoopanasynchronousdistributedframeworkforincrementaliterativecomputations
AT hodamomokhtar iihadoopanasynchronousdistributedframeworkforincrementaliterativecomputations