Handling data-skewness in character based string similarity join using Hadoop

Handling data-skewness in character based string similarity join using Hadoop

The scalability of similarity joins is threatened by the unexpected data characteristic of data skewness. This is a pervasive problem in scientific data. Due to skewness, the uneven distribution of attributes occurs, and it can cause a severe load imbalance problem. When database join operations are...

Full description

Bibliographic Details
Main Authors:	Kanak Meena, Devendra K. Tayal, Oscar Castillo, Amita Jain
Format:	Article
Language:	English
Published:	Emerald Publishing 2022-03-01
Series:	Applied Computing and Informatics
Subjects:	Similarity join Big data Hadoop MapReduce Data skewness
Online Access:	https://www.emerald.com/insight/content/doi/10.1016/j.aci.2018.11.001/full/pdf

Similar Items

Skewness-Based Partitioning in SpatialHadoop
by: Alberto Belussi, et al.
Published: (2020-03-01)

An analysis of two-way equi-join algorithms under MapReduce
by: Amer F. Al-Badarneh, et al.
Published: (2022-04-01)

Comparative Analysis of Skew-Join Strategies for Large-Scale Datasets with MapReduce and Spark
by: Anh-Cang Phan, et al.
Published: (2022-06-01)

Procesamiento de big data en Hadoop usando el repartition join
by: Néstor Iván Escalante Fol, et al.
Published: (2015-06-01)

Embedding GPU Computations in Hadoop
by: Jie Zhu, et al.
Published: (2014-11-01)

Hadoop Performance Analysis Model with Deep Data Locality
by: Sungchul Lee, et al.
Published: (2019-06-01)

FASTA/Q data compressors for MapReduce-Hadoop genomics: space and time savings made easy
by: Umberto Ferraro Petrillo, et al.
Published: (2021-03-01)

Acerca de la aplicación de MapReduce + Hadoop en el tratamiento de Big Data
by: Antonio Hernández Dominguez, et al.
Published: (2015-07-01)

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
by: Avishan Sharafi, et al.
Published: (2016-12-01)

Big Data: Hadoop framework vulnerabilities, security issues and attacks
by: Gurjit Singh Bhathal, et al.
Published: (2019-01-01)

Clustering large datasets using K-means modified inter and intra clustering (KM-I2C) in Hadoop
by: Chowdam Sreedhar, et al.
Published: (2017-09-01)

Sandbox security model for Hadoop file system
by: Gousiya Begum, et al.
Published: (2020-09-01)

Real-coded multi-objective genetic algorithm with effective queuing model for efficient job scheduling in heterogeneous Hadoop environment
by: V. Seethalakshmi, et al.
Published: (2022-06-01)

Storage-Tag-Aware Scheduler for Hadoop Cluster
by: Nawab Muhammad Faseeh Qureshi, et al.
Published: (2017-01-01)

A comprehensive performance analysis of Apache Hadoop and Apache Spark for large scale data sets using HiBench
by: N. Ahmed, et al.
Published: (2020-12-01)

Analyzing Job Aware Scheduling Algorithm in Hadoop for Heterogeneous Cluster
by: Mayuri A Mehta, et al.
Published: (2015-12-01)

A Hierarchical Hadoop Framework to Process Geo-Distributed Big Data
by: Giuseppe Di Modica, et al.
Published: (2022-01-01)

Job schedulers for Big data processing in Hadoop environment: testing real-life schedulers using benchmark programs
by: Mohd Usama, et al.
Published: (2017-11-01)

Estimating runtime of a job in Hadoop MapReduce
by: Narges Peyravi, et al.
Published: (2020-07-01)

IMapC: Inner MAPping Combiner to Enhance the Performance of MapReduce in Hadoop
by: C. Kavitha, et al.
Published: (2022-05-01)

Replication-Based Query Management for Resource Allocation Using Hadoop and MapReduce over Big Data
by: Ankit Kumar, et al.
Published: (2023-12-01)

A Novel Configuration Tuning Method Based on Feature Selection for Hadoop MapReduce
by: Jun Liu, et al.
Published: (2020-01-01)

A Distributed Video Management Cloud Platform Using Hadoop
by: Xin Liu, et al.
Published: (2015-01-01)

MISFP-Growth: Hadoop-Based Frequent Pattern Mining with Multiple Item Support
by: Chen-Shu Wang, et al.
Published: (2019-05-01)

Big-data Management using Map Reduce on Cloud: Case study, EEG Images' Data
by: Sahar Mahdie Klim, et al.
Published: (2017-03-01)

Power-Law Distributed Graph Generation With MapReduce
by: Renzo Angles, et al.
Published: (2021-01-01)

MapReduce scheduling algorithms in Hadoop: a systematic study
by: Soudabeh Hedayati, et al.
Published: (2023-10-01)

Big Data Analytics for Healthcare Industry: Impact, Applications, and Tools
by: Sunil Kumar, et al.
Published: (2019-03-01)

An Intelligent Metaheuristic Binary Pigeon Optimization-Based Feature Selection and Big Data Classification in a MapReduce Environment
by: Felwa Abukhodair, et al.
Published: (2021-10-01)

Improving MapReduce privacy by implementing multi-dimensional sensitivity-based anonymization
by: Mohammed Al-Zobbi, et al.
Published: (2017-12-01)

Design and analysis of management platform based on financial big data
by: Yuhua Chen, et al.
Published: (2023-03-01)

MR-Tree - A Scalable MapReduce Algorithm for Building Decision Trees
by: Vasile PURDILĂ, et al.
Published: (2014-03-01)

Searching of Chaotic Elements in Hydrology
by: Sorin VLAD, et al.
Published: (2014-03-01)

Experimental Analysis in Hadoop MapReduce: A Closer Look at Fault Detection and Recovery Techniques
by: Muntadher Saadoon, et al.
Published: (2021-05-01)

Indexing strategies of MapReduce for Information Retrieval in Big Data
by: Farid, Mazen, et al.
Published: (2016)

LSTPD: Least Slack Time-Based Preemptive Deadline Constraint Scheduler for Hadoop Clusters
by: Ihsan Ullah, et al.
Published: (2020-01-01)

Telecare service activity analysis using Big Data and Data Mining
by: Alfredo Moreno Muñoz, et al.
Published: (2017-01-01)

High-Performance Geospatial Big Data Processing System Based on MapReduce
by: Junghee Jo, et al.
Published: (2018-10-01)

Integrating big data and blockchain to manage energy smart grids—TOTEM framework
by: Dhanya Therese Jose, et al.
Published: (2022-09-01)

Big Data Components for Business Process Optimization
by: Mircea Raducu TRIFU, et al.
Published: (2016-01-01)