Survey of Distributed Computing Frameworks for Supporting Big Data Analysis

Distributed computing frameworks are the fundamental component of distributed computing systems. They provide an essential way to support the efficient processing of big data on clusters or cloud. The size of big data increases at a pace that is faster than the increase in the big data processing ca...

Full description

Bibliographic Details
Main Authors:	Xudong Sun, Yulin He, Dingming Wu, Joshua Zhexue Huang
Format:	Article
Language:	English
Published:	Tsinghua University Press 2023-06-01
Series:	Big Data Mining and Analytics
Subjects:	distributed computing frameworks big data analysis approximate computing mapreduce computing model
Online Access:	https://www.sciopen.com/article/10.26599/BDMA.2022.9020014

_version_	1797902904760205312
author	Xudong Sun Yulin He Dingming Wu Joshua Zhexue Huang
author_facet	Xudong Sun Yulin He Dingming Wu Joshua Zhexue Huang
author_sort	Xudong Sun
collection	DOAJ
description	Distributed computing frameworks are the fundamental component of distributed computing systems. They provide an essential way to support the efficient processing of big data on clusters or cloud. The size of big data increases at a pace that is faster than the increase in the big data processing capacity of clusters. Thus, distributed computing frameworks based on the MapReduce computing model are not adequate to support big data analysis tasks which often require running complex analytical algorithms on extremely big data sets in terabytes. In performing such tasks, these frameworks face three challenges: computational inefficiency due to high I/O and communication costs, non-scalability to big data due to memory limit, and limited analytical algorithms because many serial algorithms cannot be implemented in the MapReduce programming model. New distributed computing frameworks need to be developed to conquer these challenges. In this paper, we review MapReduce-type distributed computing frameworks that are currently used in handling big data and discuss their problems when conducting big data analysis. In addition, we present a non-MapReduce distributed computing framework that has the potential to overcome big data analysis challenges.
first_indexed	2024-04-10T09:24:39Z
format	Article
id	doaj.art-b3eae79348174454895ca39ca9012701
institution	Directory Open Access Journal
issn	2096-0654
language	English
last_indexed	2024-04-10T09:24:39Z
publishDate	2023-06-01
publisher	Tsinghua University Press
record_format	Article
series	Big Data Mining and Analytics
spelling	doaj.art-b3eae79348174454895ca39ca90127012023-02-20T07:01:54ZengTsinghua University PressBig Data Mining and Analytics2096-06542023-06-016215416910.26599/BDMA.2022.9020014Survey of Distributed Computing Frameworks for Supporting Big Data AnalysisXudong Sun0Yulin He1Dingming Wu2Joshua Zhexue Huang3College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, ChinaCollege of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, ChinaCollege of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, ChinaCollege of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, ChinaDistributed computing frameworks are the fundamental component of distributed computing systems. They provide an essential way to support the efficient processing of big data on clusters or cloud. The size of big data increases at a pace that is faster than the increase in the big data processing capacity of clusters. Thus, distributed computing frameworks based on the MapReduce computing model are not adequate to support big data analysis tasks which often require running complex analytical algorithms on extremely big data sets in terabytes. In performing such tasks, these frameworks face three challenges: computational inefficiency due to high I/O and communication costs, non-scalability to big data due to memory limit, and limited analytical algorithms because many serial algorithms cannot be implemented in the MapReduce programming model. New distributed computing frameworks need to be developed to conquer these challenges. In this paper, we review MapReduce-type distributed computing frameworks that are currently used in handling big data and discuss their problems when conducting big data analysis. In addition, we present a non-MapReduce distributed computing framework that has the potential to overcome big data analysis challenges.https://www.sciopen.com/article/10.26599/BDMA.2022.9020014distributed computing frameworksbig data analysisapproximate computingmapreduce computing model
spellingShingle	Xudong Sun Yulin He Dingming Wu Joshua Zhexue Huang Survey of Distributed Computing Frameworks for Supporting Big Data Analysis Big Data Mining and Analytics distributed computing frameworks big data analysis approximate computing mapreduce computing model
title	Survey of Distributed Computing Frameworks for Supporting Big Data Analysis
title_full	Survey of Distributed Computing Frameworks for Supporting Big Data Analysis
title_fullStr	Survey of Distributed Computing Frameworks for Supporting Big Data Analysis
title_full_unstemmed	Survey of Distributed Computing Frameworks for Supporting Big Data Analysis
title_short	Survey of Distributed Computing Frameworks for Supporting Big Data Analysis
title_sort	survey of distributed computing frameworks for supporting big data analysis
topic	distributed computing frameworks big data analysis approximate computing mapreduce computing model
url	https://www.sciopen.com/article/10.26599/BDMA.2022.9020014
work_keys_str_mv	AT xudongsun surveyofdistributedcomputingframeworksforsupportingbigdataanalysis AT yulinhe surveyofdistributedcomputingframeworksforsupportingbigdataanalysis AT dingmingwu surveyofdistributedcomputingframeworksforsupportingbigdataanalysis AT joshuazhexuehuang surveyofdistributedcomputingframeworksforsupportingbigdataanalysis

Survey of Distributed Computing Frameworks for Supporting Big Data Analysis

Similar Items