Performance Optimization System for Hadoop and Spark Frameworks
The optimization of large-scale data sets depends on the technologies and methods used. The MapReduce model, implemented on Apache Hadoop or Spark, allows splitting large data sets into a set of blocks distributed on several machines. Data compression reduces data size and transfer time between disk...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Sciendo
2020-12-01
|
Series: | Cybernetics and Information Technologies |
Subjects: | |
Online Access: | https://doi.org/10.2478/cait-2020-0056 |
_version_ | 1811272303882797056 |
---|---|
author | Astsatryan Hrachya Kocharyan Aram Hagimont Daniel Lalayan Arthur |
author_facet | Astsatryan Hrachya Kocharyan Aram Hagimont Daniel Lalayan Arthur |
author_sort | Astsatryan Hrachya |
collection | DOAJ |
description | The optimization of large-scale data sets depends on the technologies and methods used. The MapReduce model, implemented on Apache Hadoop or Spark, allows splitting large data sets into a set of blocks distributed on several machines. Data compression reduces data size and transfer time between disks and memory but requires additional processing. Therefore, finding an optimal tradeoff is a challenge, as a high compression factor may underload Input/Output but overload the processor. The paper aims to present a system enabling the selection of the compression tools and tuning the compression factor to reach the best performance in Apache Hadoop and Spark infrastructures based on simulation analyzes. |
first_indexed | 2024-04-12T22:36:50Z |
format | Article |
id | doaj.art-12889a3dee384215a31e4c53b35f54b6 |
institution | Directory Open Access Journal |
issn | 1314-4081 |
language | English |
last_indexed | 2024-04-12T22:36:50Z |
publishDate | 2020-12-01 |
publisher | Sciendo |
record_format | Article |
series | Cybernetics and Information Technologies |
spelling | doaj.art-12889a3dee384215a31e4c53b35f54b62022-12-22T03:13:49ZengSciendoCybernetics and Information Technologies1314-40812020-12-0120651710.2478/cait-2020-0056Performance Optimization System for Hadoop and Spark FrameworksAstsatryan Hrachya0Kocharyan Aram1Hagimont Daniel2Lalayan Arthur3Institute for Informatics and Automation Problems of the National Academy of Sciences of the Republic of Armenia, Yerevan0014, ArmeniaUniversité Fédérale Toulouse Midi-Pyrénées, Toulouse Cedex 7, FranceUniversité Fédérale Toulouse Midi-Pyrénées, Toulouse Cedex 7, FranceNational Polytechnic University of Armenia, Yerevan0009, ArmeniaThe optimization of large-scale data sets depends on the technologies and methods used. The MapReduce model, implemented on Apache Hadoop or Spark, allows splitting large data sets into a set of blocks distributed on several machines. Data compression reduces data size and transfer time between disks and memory but requires additional processing. Therefore, finding an optimal tradeoff is a challenge, as a high compression factor may underload Input/Output but overload the processor. The paper aims to present a system enabling the selection of the compression tools and tuning the compression factor to reach the best performance in Apache Hadoop and Spark infrastructures based on simulation analyzes.https://doi.org/10.2478/cait-2020-0056hadoopsparkdata compressioncpu/io tradeoffperformance optimization |
spellingShingle | Astsatryan Hrachya Kocharyan Aram Hagimont Daniel Lalayan Arthur Performance Optimization System for Hadoop and Spark Frameworks Cybernetics and Information Technologies hadoop spark data compression cpu/io tradeoff performance optimization |
title | Performance Optimization System for Hadoop and Spark Frameworks |
title_full | Performance Optimization System for Hadoop and Spark Frameworks |
title_fullStr | Performance Optimization System for Hadoop and Spark Frameworks |
title_full_unstemmed | Performance Optimization System for Hadoop and Spark Frameworks |
title_short | Performance Optimization System for Hadoop and Spark Frameworks |
title_sort | performance optimization system for hadoop and spark frameworks |
topic | hadoop spark data compression cpu/io tradeoff performance optimization |
url | https://doi.org/10.2478/cait-2020-0056 |
work_keys_str_mv | AT astsatryanhrachya performanceoptimizationsystemforhadoopandsparkframeworks AT kocharyanaram performanceoptimizationsystemforhadoopandsparkframeworks AT hagimontdaniel performanceoptimizationsystemforhadoopandsparkframeworks AT lalayanarthur performanceoptimizationsystemforhadoopandsparkframeworks |