Distributed Distance Join Algorithm for Massive Spatial Data

Spatial distance join is one of the most common operations for spatial data analysis,which has various application scenarios.Existing distributed methods face the problems of too large space,high data skew,and slow self-join.To this end,this paper proposes a novel distributed distance join algorithm...

Full description

Bibliographic Details
Main Author: WANG Ru-bin, LI Rui-yuan, HE Hua-jun, LIU Tong, LI Tian-rui
Format: Article
Language:zho
Published: Editorial office of Computer Science 2022-01-01
Series:Jisuanji kexue
Subjects:
Online Access:https://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2022-1-95.pdf
_version_ 1818995688682291200
author WANG Ru-bin, LI Rui-yuan, HE Hua-jun, LIU Tong, LI Tian-rui
author_facet WANG Ru-bin, LI Rui-yuan, HE Hua-jun, LIU Tong, LI Tian-rui
author_sort WANG Ru-bin, LI Rui-yuan, HE Hua-jun, LIU Tong, LI Tian-rui
collection DOAJ
description Spatial distance join is one of the most common operations for spatial data analysis,which has various application scenarios.Existing distributed methods face the problems of too large space,high data skew,and slow self-join.To this end,this paper proposes a novel distributed distance join algorithm,i.e.,JUST-Join,for massive spatial data.First,JUST-Join regards only the necessary space as the global domain,which can filter invalid data out,reducing the overhead of unnecessary data transmission and computation.Second,we consider both the spatial distributions of the two datasets,which relieves the data skew issue.Third,for the spatial self-join,we adopt plane sweep method to further improve the efficiency.We implement JUST-Join algorithm based on Spark,and conduct extensive experiments using real datasets.The experimental results show that JUST-Join is superior to the state-of-the-art distributed spatial analysis systems in terms both of efficiency and scalability.
first_indexed 2024-12-20T21:17:49Z
format Article
id doaj.art-e70f5970a4444d74a7165c7984d0933d
institution Directory Open Access Journal
issn 1002-137X
language zho
last_indexed 2024-12-20T21:17:49Z
publishDate 2022-01-01
publisher Editorial office of Computer Science
record_format Article
series Jisuanji kexue
spelling doaj.art-e70f5970a4444d74a7165c7984d0933d2022-12-21T19:26:22ZzhoEditorial office of Computer ScienceJisuanji kexue1002-137X2022-01-014919510010.11896/jsjkx.210100060Distributed Distance Join Algorithm for Massive Spatial DataWANG Ru-bin, LI Rui-yuan, HE Hua-jun, LIU Tong, LI Tian-rui01 School of Information Science and Technology,Southwest Jiaotong University,Chengdu 611756,China<br/>2 College of Computer Science,Chongqing University,Chongqing 400044,China<br/>3 JD Intelligent Cities Research,Beijing 100176,China<br/>4 School of Computer Engineering and Science,Shanghai University,Shanghai 200444,ChinaSpatial distance join is one of the most common operations for spatial data analysis,which has various application scenarios.Existing distributed methods face the problems of too large space,high data skew,and slow self-join.To this end,this paper proposes a novel distributed distance join algorithm,i.e.,JUST-Join,for massive spatial data.First,JUST-Join regards only the necessary space as the global domain,which can filter invalid data out,reducing the overhead of unnecessary data transmission and computation.Second,we consider both the spatial distributions of the two datasets,which relieves the data skew issue.Third,for the spatial self-join,we adopt plane sweep method to further improve the efficiency.We implement JUST-Join algorithm based on Spark,and conduct extensive experiments using real datasets.The experimental results show that JUST-Join is superior to the state-of-the-art distributed spatial analysis systems in terms both of efficiency and scalability.https://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2022-1-95.pdfspatial distance join|spatial partition|distributed computing|spatial indexing|spatio-temporal data
spellingShingle WANG Ru-bin, LI Rui-yuan, HE Hua-jun, LIU Tong, LI Tian-rui
Distributed Distance Join Algorithm for Massive Spatial Data
Jisuanji kexue
spatial distance join|spatial partition|distributed computing|spatial indexing|spatio-temporal data
title Distributed Distance Join Algorithm for Massive Spatial Data
title_full Distributed Distance Join Algorithm for Massive Spatial Data
title_fullStr Distributed Distance Join Algorithm for Massive Spatial Data
title_full_unstemmed Distributed Distance Join Algorithm for Massive Spatial Data
title_short Distributed Distance Join Algorithm for Massive Spatial Data
title_sort distributed distance join algorithm for massive spatial data
topic spatial distance join|spatial partition|distributed computing|spatial indexing|spatio-temporal data
url https://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2022-1-95.pdf
work_keys_str_mv AT wangrubinliruiyuanhehuajunliutonglitianrui distributeddistancejoinalgorithmformassivespatialdata