Optimization of the Join between Large Tables in the Spark Distributed Framework

Optimization of the Join between Large Tables in the Spark Distributed Framework

The Join task between Spark large tables takes a long time to run and produces a lot of disk I/O, network I/O and disk occupation in the Shuffle process. This paper proposes a lightweight distributed data filtering model that combines broadcast variables and accumulators using RoaringBitmap. When th...

Full description

Bibliographic Details
Main Authors:	Xiang Wu, Yueshun He
Format:	Article
Language:	English
Published:	MDPI AG 2023-05-01
Series:	Applied Sciences
Subjects:	Join Spark Shuffle optimization method RoaringBitmap
Online Access:	https://www.mdpi.com/2076-3417/13/10/6257

Similar Items

Comparative Analysis of Skew-Join Strategies for Large-Scale Datasets with MapReduce and Spark
by: Anh-Cang Phan, et al.
Published: (2022-06-01)

Accelerating Distributed Repartition Joins on Skewed Datasets via Patch-Based Shuffling
by: Evdokia Kassela, et al.
Published: (2025-01-01)

An Effective High-Performance Multiway Spatial Join Algorithm with Spark
by: Zhenhong Du, et al.
Published: (2017-03-01)

Joining of alumina ceramics with Ti and Zr interlayers by spark plasma sintering
by: Maria Stosz, et al.
Published: (2023-03-01)

Measurement of Encryption Quality of Bitmap Images with RC6, and two modified version Block Cipher
by: Baedaa H. Helal, et al.
Published: (2010-08-01)

Spark plasma welding joining of copper- AISI4140 steel: Microstructures and mechanical properties
by: Mehdi Naderi, et al.
Published: (2023-11-01)

American Sport and the Sports Heroes of the Roaring Twenties
by: Michał Mazurkiewicz
Published: (2014-04-01)

Robustness Analysis of Pin Joining
by: David Römisch, et al.
Published: (2022-10-01)

Roar of a Champion: Loudness and Voice Pitch Predict Perceived Fighting Ability but Not Success in MMA Fighters
by: Pavel Šebesta, et al.
Published: (2019-04-01)

Joining of Oxide Dispersion-Strengthened Steel Using Spark Plasma Sintering
by: Foad Naimi, et al.
Published: (2020-08-01)

Unconventional Materials Processing Using Spark Plasma Sintering
by: Ambreen Nisar, et al.
Published: (2021-01-01)

Building method of distributed COW disk on network computing environment
by: Huai-liang TAN, et al.
Published: (2012-07-01)

Building method of distributed COW disk on network computing environment
by: Huai-liang TAN, et al.
Published: (2012-07-01)

Du « clan divin des femmes amoureuses » à la « race maudite » : élaboration, représentations et discontinuités de l’identité lesbienne dans la trajectoire de Mireille Havet (1898-1932)
by: Emmanuelle Rétaillaud-Bajac
Published: (2009-07-01)

Transfer Points: Artistic Intersections and Cultural Transitions in John Dos Passos’s Fiction of the 1920s
by: Robert MCPARLAND
Published: (2020-10-01)

Joins, Secant Varieties and Their Associated Grassmannians
by: Edoardo Ballico
Published: (2024-04-01)

Supporting Efficient Family Joins for Big Data Tables via Multiple Freedom Family Index
by: Qiang Zhu, et al.
Published: (2025-01-01)

Design of a data processing method for the farmland environmental monitoring based on improved Spark components
by: Ruipeng Tang, et al.
Published: (2023-11-01)

Sparking plugs
by: 8096 British Standards Institution

Spark plugs
by: 982 SIRIM

Statistics of location of sparks on circular electrodes /
by: 450936 Ball, Edwin D.

Promoting and Containing New Womanhood in the Pages of Photoplay: The Case Of “Little Mary” Pickford and Her Mediated Alter Egos on the Cusp of the Roaring Twenties
by: Kylo-Patrick R. HART
Published: (2020-10-01)

Modeling of spark ignition engines
by: Society of Automotive Engineers, et al.
Published: (2004)

Femtosecond Laser-Induced Nano-Joining of Volatile Tellurium Nanotube Memristor
by: Yongchao Yu, et al.
Published: (2023-02-01)

Spectra of R-Vertex Join and R-Edge Join of Two Graphs
by: Das Arpita, et al.
Published: (2018-06-01)

Spark Timing Optimization through Co-Simulation Analysis in a Spark Ignition Engine
by: Ivan Arsie, et al.
Published: (2024-07-01)

A Distributed Quantum-Behaved Particle Swarm Optimization Using Opposition-Based Learning on Spark for Large-Scale Optimization Problem
by: Zhaojuan Zhang, et al.
Published: (2020-10-01)

Spark plugs
Published: (1970)

Tube Joining by a Sheet Flange Connection
by: Rafael M. Afonso, et al.
Published: (2022-12-01)

Joint design of tuple space and bitmap for two-dimensional packet classification
by: XIE Kun, et al.
Published: (2011-01-01)

Joint design of tuple space and bitmap for two-dimensional packet classification
by: XIE Kun, et al.
Published: (2011-01-01)

BITMAP INDEX: A DATA STRUCTURE FOR FAST FILE RETRIEVAL
by: Murtadha M. Hamad
Published: (2008-04-01)

IMAGE WAVELETE COMPRESSION USING SHIFT CODING AND BITMAP SLICING
by: Samyia S. Lazar, et al.
Published: (2008-04-01)

Join Spaces and Lattices
by: Violeta Leoreanu-Fotea, et al.
Published: (2024-10-01)

Aeroelastic Optimization Design of the Global Stiffness for a Joined Wing Aircraft
by: Xuyang Li, et al.
Published: (2021-12-01)

ReJOOSp: Reinforcement Learning for Join Order Optimization in SPARQL
by: Benjamin Warnke, et al.
Published: (2024-06-01)

Investigations to improve the tool life during thermomechanical and incremental forming of steel auxiliary joining elements
by: T Borgert, et al.
Published: (2024-06-01)

Disappearance of the Self and Its Constitutive Outside in Kafka and Woody Allen’s Zelig
by: Lukas MOZDEIKA
Published: (2020-10-01)

Thermomechanical Joining of Hypoeutectic Aluminium Cast Plates
by: Thomas Borgert, et al.
Published: (2023-09-01)

Monophonic Distance Energy for Join of Some Graphs
by: V S Sinju Manohar, et al.
Published: (2023-01-01)