Distributed Distance Join Algorithm for Massive Spatial Data
Spatial distance join is one of the most common operations for spatial data analysis,which has various application scenarios.Existing distributed methods face the problems of too large space,high data skew,and slow self-join.To this end,this paper proposes a novel distributed distance join algorithm...
Main Author: | |
---|---|
Format: | Article |
Language: | zho |
Published: |
Editorial office of Computer Science
2022-01-01
|
Series: | Jisuanji kexue |
Subjects: | |
Online Access: | https://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2022-1-95.pdf |
_version_ | 1818995688682291200 |
---|---|
author | WANG Ru-bin, LI Rui-yuan, HE Hua-jun, LIU Tong, LI Tian-rui |
author_facet | WANG Ru-bin, LI Rui-yuan, HE Hua-jun, LIU Tong, LI Tian-rui |
author_sort | WANG Ru-bin, LI Rui-yuan, HE Hua-jun, LIU Tong, LI Tian-rui |
collection | DOAJ |
description | Spatial distance join is one of the most common operations for spatial data analysis,which has various application scenarios.Existing distributed methods face the problems of too large space,high data skew,and slow self-join.To this end,this paper proposes a novel distributed distance join algorithm,i.e.,JUST-Join,for massive spatial data.First,JUST-Join regards only the necessary space as the global domain,which can filter invalid data out,reducing the overhead of unnecessary data transmission and computation.Second,we consider both the spatial distributions of the two datasets,which relieves the data skew issue.Third,for the spatial self-join,we adopt plane sweep method to further improve the efficiency.We implement JUST-Join algorithm based on Spark,and conduct extensive experiments using real datasets.The experimental results show that JUST-Join is superior to the state-of-the-art distributed spatial analysis systems in terms both of efficiency and scalability. |
first_indexed | 2024-12-20T21:17:49Z |
format | Article |
id | doaj.art-e70f5970a4444d74a7165c7984d0933d |
institution | Directory Open Access Journal |
issn | 1002-137X |
language | zho |
last_indexed | 2024-12-20T21:17:49Z |
publishDate | 2022-01-01 |
publisher | Editorial office of Computer Science |
record_format | Article |
series | Jisuanji kexue |
spelling | doaj.art-e70f5970a4444d74a7165c7984d0933d2022-12-21T19:26:22ZzhoEditorial office of Computer ScienceJisuanji kexue1002-137X2022-01-014919510010.11896/jsjkx.210100060Distributed Distance Join Algorithm for Massive Spatial DataWANG Ru-bin, LI Rui-yuan, HE Hua-jun, LIU Tong, LI Tian-rui01 School of Information Science and Technology,Southwest Jiaotong University,Chengdu 611756,China<br/>2 College of Computer Science,Chongqing University,Chongqing 400044,China<br/>3 JD Intelligent Cities Research,Beijing 100176,China<br/>4 School of Computer Engineering and Science,Shanghai University,Shanghai 200444,ChinaSpatial distance join is one of the most common operations for spatial data analysis,which has various application scenarios.Existing distributed methods face the problems of too large space,high data skew,and slow self-join.To this end,this paper proposes a novel distributed distance join algorithm,i.e.,JUST-Join,for massive spatial data.First,JUST-Join regards only the necessary space as the global domain,which can filter invalid data out,reducing the overhead of unnecessary data transmission and computation.Second,we consider both the spatial distributions of the two datasets,which relieves the data skew issue.Third,for the spatial self-join,we adopt plane sweep method to further improve the efficiency.We implement JUST-Join algorithm based on Spark,and conduct extensive experiments using real datasets.The experimental results show that JUST-Join is superior to the state-of-the-art distributed spatial analysis systems in terms both of efficiency and scalability.https://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2022-1-95.pdfspatial distance join|spatial partition|distributed computing|spatial indexing|spatio-temporal data |
spellingShingle | WANG Ru-bin, LI Rui-yuan, HE Hua-jun, LIU Tong, LI Tian-rui Distributed Distance Join Algorithm for Massive Spatial Data Jisuanji kexue spatial distance join|spatial partition|distributed computing|spatial indexing|spatio-temporal data |
title | Distributed Distance Join Algorithm for Massive Spatial Data |
title_full | Distributed Distance Join Algorithm for Massive Spatial Data |
title_fullStr | Distributed Distance Join Algorithm for Massive Spatial Data |
title_full_unstemmed | Distributed Distance Join Algorithm for Massive Spatial Data |
title_short | Distributed Distance Join Algorithm for Massive Spatial Data |
title_sort | distributed distance join algorithm for massive spatial data |
topic | spatial distance join|spatial partition|distributed computing|spatial indexing|spatio-temporal data |
url | https://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2022-1-95.pdf |
work_keys_str_mv | AT wangrubinliruiyuanhehuajunliutonglitianrui distributeddistancejoinalgorithmformassivespatialdata |