Optimization of the Join between Large Tables in the Spark Distributed Framework
The Join task between Spark large tables takes a long time to run and produces a lot of disk I/O, network I/O and disk occupation in the Shuffle process. This paper proposes a lightweight distributed data filtering model that combines broadcast variables and accumulators using RoaringBitmap. When th...
Main Authors: | Xiang Wu, Yueshun He |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-05-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/13/10/6257 |
Similar Items
-
Comparative Analysis of Skew-Join Strategies for Large-Scale Datasets with MapReduce and Spark
by: Anh-Cang Phan, et al.
Published: (2022-06-01) -
Accelerating Distributed Repartition Joins on Skewed Datasets via Patch-Based Shuffling
by: Evdokia Kassela, et al.
Published: (2025-01-01) -
An Effective High-Performance Multiway Spatial Join Algorithm with Spark
by: Zhenhong Du, et al.
Published: (2017-03-01) -
Joining of alumina ceramics with Ti and Zr interlayers by spark plasma sintering
by: Maria Stosz, et al.
Published: (2023-03-01) -
Measurement of Encryption Quality of Bitmap Images with RC6, and two modified version Block Cipher
by: Baedaa H. Helal, et al.
Published: (2010-08-01)