Performance Optimization System for Hadoop and Spark Frameworks

The optimization of large-scale data sets depends on the technologies and methods used. The MapReduce model, implemented on Apache Hadoop or Spark, allows splitting large data sets into a set of blocks distributed on several machines. Data compression reduces data size and transfer time between disk...

Full description

Bibliographic Details
Main Authors:	Astsatryan Hrachya, Kocharyan Aram, Hagimont Daniel, Lalayan Arthur
Format:	Article
Language:	English
Published:	Sciendo 2020-12-01
Series:	Cybernetics and Information Technologies
Subjects:	hadoop spark data compression cpu/io tradeoff performance optimization
Online Access:	https://doi.org/10.2478/cait-2020-0056

_version_	1811272303882797056
author	Astsatryan Hrachya Kocharyan Aram Hagimont Daniel Lalayan Arthur
author_facet	Astsatryan Hrachya Kocharyan Aram Hagimont Daniel Lalayan Arthur
author_sort	Astsatryan Hrachya
collection	DOAJ
description	The optimization of large-scale data sets depends on the technologies and methods used. The MapReduce model, implemented on Apache Hadoop or Spark, allows splitting large data sets into a set of blocks distributed on several machines. Data compression reduces data size and transfer time between disks and memory but requires additional processing. Therefore, finding an optimal tradeoff is a challenge, as a high compression factor may underload Input/Output but overload the processor. The paper aims to present a system enabling the selection of the compression tools and tuning the compression factor to reach the best performance in Apache Hadoop and Spark infrastructures based on simulation analyzes.
first_indexed	2024-04-12T22:36:50Z
format	Article
id	doaj.art-12889a3dee384215a31e4c53b35f54b6
institution	Directory Open Access Journal
issn	1314-4081
language	English
last_indexed	2024-04-12T22:36:50Z
publishDate	2020-12-01
publisher	Sciendo
record_format	Article
series	Cybernetics and Information Technologies
spelling	doaj.art-12889a3dee384215a31e4c53b35f54b62022-12-22T03:13:49ZengSciendoCybernetics and Information Technologies1314-40812020-12-0120651710.2478/cait-2020-0056Performance Optimization System for Hadoop and Spark FrameworksAstsatryan Hrachya0Kocharyan Aram1Hagimont Daniel2Lalayan Arthur3Institute for Informatics and Automation Problems of the National Academy of Sciences of the Republic of Armenia, Yerevan0014, ArmeniaUniversité Fédérale Toulouse Midi-Pyrénées, Toulouse Cedex 7, FranceUniversité Fédérale Toulouse Midi-Pyrénées, Toulouse Cedex 7, FranceNational Polytechnic University of Armenia, Yerevan0009, ArmeniaThe optimization of large-scale data sets depends on the technologies and methods used. The MapReduce model, implemented on Apache Hadoop or Spark, allows splitting large data sets into a set of blocks distributed on several machines. Data compression reduces data size and transfer time between disks and memory but requires additional processing. Therefore, finding an optimal tradeoff is a challenge, as a high compression factor may underload Input/Output but overload the processor. The paper aims to present a system enabling the selection of the compression tools and tuning the compression factor to reach the best performance in Apache Hadoop and Spark infrastructures based on simulation analyzes.https://doi.org/10.2478/cait-2020-0056hadoopsparkdata compressioncpu/io tradeoffperformance optimization
spellingShingle	Astsatryan Hrachya Kocharyan Aram Hagimont Daniel Lalayan Arthur Performance Optimization System for Hadoop and Spark Frameworks Cybernetics and Information Technologies hadoop spark data compression cpu/io tradeoff performance optimization
title	Performance Optimization System for Hadoop and Spark Frameworks
title_full	Performance Optimization System for Hadoop and Spark Frameworks
title_fullStr	Performance Optimization System for Hadoop and Spark Frameworks
title_full_unstemmed	Performance Optimization System for Hadoop and Spark Frameworks
title_short	Performance Optimization System for Hadoop and Spark Frameworks
title_sort	performance optimization system for hadoop and spark frameworks
topic	hadoop spark data compression cpu/io tradeoff performance optimization
url	https://doi.org/10.2478/cait-2020-0056
work_keys_str_mv	AT astsatryanhrachya performanceoptimizationsystemforhadoopandsparkframeworks AT kocharyanaram performanceoptimizationsystemforhadoopandsparkframeworks AT hagimontdaniel performanceoptimizationsystemforhadoopandsparkframeworks AT lalayanarthur performanceoptimizationsystemforhadoopandsparkframeworks

Performance Optimization System for Hadoop and Spark Frameworks

Similar Items