An Efficient Group-Based Replica Placement Policy for Large-Scale Geospatial 3D Raster Data on Hadoop

Geospatial three-dimensional (3D) raster data have been widely used for simple representations and analysis, such as geological models, spatio-temporal satellite data, hyperspectral images, and climate data. With the increasing requirements of resolution and accuracy, the amount of geospatial 3D ras...

Full description

Bibliographic Details
Main Authors:	Zhipeng Liu, Weihua Hua, Xiuguo Liu, Dong Liang, Yabo Zhao, Manxing Shi
Format:	Article
Language:	English
Published:	MDPI AG 2021-12-01
Series:	Sensors
Subjects:	3D raster distributed GIS Hadoop Distributed File System replica placement
Online Access:	https://www.mdpi.com/1424-8220/21/23/8132

_version_	1827674581850128384
author	Zhipeng Liu Weihua Hua Xiuguo Liu Dong Liang Yabo Zhao Manxing Shi
author_facet	Zhipeng Liu Weihua Hua Xiuguo Liu Dong Liang Yabo Zhao Manxing Shi
author_sort	Zhipeng Liu
collection	DOAJ
description	Geospatial three-dimensional (3D) raster data have been widely used for simple representations and analysis, such as geological models, spatio-temporal satellite data, hyperspectral images, and climate data. With the increasing requirements of resolution and accuracy, the amount of geospatial 3D raster data has grown exponentially. In recent years, the processing of large raster data using Hadoop has gained popularity. However, data uploaded to Hadoop are randomly distributed onto datanodes without consideration of the spatial characteristics. As a result, the direct processing of geospatial 3D raster data produces a massive network data exchange among the datanodes and degrades the performance of the cluster. To address this problem, we propose an efficient group-based replica placement policy for large-scale geospatial 3D raster data, aiming to optimize the locations of the replicas in the cluster to reduce the network overhead. An overlapped group scheme was designed for three replicas of each file. The data in each group were placed in the same datanode, and different colocation patterns for three replicas were implemented to further reduce the communication between groups. The experimental results show that our approach significantly reduces the network overhead during data acquisition for 3D raster data in the Hadoop cluster, and maintains the Hadoop replica placement requirements.
first_indexed	2024-03-10T04:44:18Z
format	Article
id	doaj.art-50533ba52bb54259b2769f012e09f459
institution	Directory Open Access Journal
issn	1424-8220
language	English
last_indexed	2024-03-10T04:44:18Z
publishDate	2021-12-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj.art-50533ba52bb54259b2769f012e09f4592023-11-23T03:04:26ZengMDPI AGSensors1424-82202021-12-012123813210.3390/s21238132An Efficient Group-Based Replica Placement Policy for Large-Scale Geospatial 3D Raster Data on HadoopZhipeng Liu0Weihua Hua1Xiuguo Liu2Dong Liang3Yabo Zhao4Manxing Shi5School of Geography and Information Engineering, China University of Geosciences, Wuhan 430074, ChinaSchool of Geography and Information Engineering, China University of Geosciences, Wuhan 430074, ChinaSchool of Geography and Information Engineering, China University of Geosciences, Wuhan 430074, ChinaSchool of Geography and Information Engineering, China University of Geosciences, Wuhan 430074, ChinaSchool of Geography and Information Engineering, China University of Geosciences, Wuhan 430074, ChinaSchool of Geography and Information Engineering, China University of Geosciences, Wuhan 430074, ChinaGeospatial three-dimensional (3D) raster data have been widely used for simple representations and analysis, such as geological models, spatio-temporal satellite data, hyperspectral images, and climate data. With the increasing requirements of resolution and accuracy, the amount of geospatial 3D raster data has grown exponentially. In recent years, the processing of large raster data using Hadoop has gained popularity. However, data uploaded to Hadoop are randomly distributed onto datanodes without consideration of the spatial characteristics. As a result, the direct processing of geospatial 3D raster data produces a massive network data exchange among the datanodes and degrades the performance of the cluster. To address this problem, we propose an efficient group-based replica placement policy for large-scale geospatial 3D raster data, aiming to optimize the locations of the replicas in the cluster to reduce the network overhead. An overlapped group scheme was designed for three replicas of each file. The data in each group were placed in the same datanode, and different colocation patterns for three replicas were implemented to further reduce the communication between groups. The experimental results show that our approach significantly reduces the network overhead during data acquisition for 3D raster data in the Hadoop cluster, and maintains the Hadoop replica placement requirements.https://www.mdpi.com/1424-8220/21/23/81323D rasterdistributed GISHadoop Distributed File Systemreplica placement
spellingShingle	Zhipeng Liu Weihua Hua Xiuguo Liu Dong Liang Yabo Zhao Manxing Shi An Efficient Group-Based Replica Placement Policy for Large-Scale Geospatial 3D Raster Data on Hadoop Sensors 3D raster distributed GIS Hadoop Distributed File System replica placement
title	An Efficient Group-Based Replica Placement Policy for Large-Scale Geospatial 3D Raster Data on Hadoop
title_full	An Efficient Group-Based Replica Placement Policy for Large-Scale Geospatial 3D Raster Data on Hadoop
title_fullStr	An Efficient Group-Based Replica Placement Policy for Large-Scale Geospatial 3D Raster Data on Hadoop
title_full_unstemmed	An Efficient Group-Based Replica Placement Policy for Large-Scale Geospatial 3D Raster Data on Hadoop
title_short	An Efficient Group-Based Replica Placement Policy for Large-Scale Geospatial 3D Raster Data on Hadoop
title_sort	efficient group based replica placement policy for large scale geospatial 3d raster data on hadoop
topic	3D raster distributed GIS Hadoop Distributed File System replica placement
url	https://www.mdpi.com/1424-8220/21/23/8132
work_keys_str_mv	AT zhipengliu anefficientgroupbasedreplicaplacementpolicyforlargescalegeospatial3drasterdataonhadoop AT weihuahua anefficientgroupbasedreplicaplacementpolicyforlargescalegeospatial3drasterdataonhadoop AT xiuguoliu anefficientgroupbasedreplicaplacementpolicyforlargescalegeospatial3drasterdataonhadoop AT dongliang anefficientgroupbasedreplicaplacementpolicyforlargescalegeospatial3drasterdataonhadoop AT yabozhao anefficientgroupbasedreplicaplacementpolicyforlargescalegeospatial3drasterdataonhadoop AT manxingshi anefficientgroupbasedreplicaplacementpolicyforlargescalegeospatial3drasterdataonhadoop AT zhipengliu efficientgroupbasedreplicaplacementpolicyforlargescalegeospatial3drasterdataonhadoop AT weihuahua efficientgroupbasedreplicaplacementpolicyforlargescalegeospatial3drasterdataonhadoop AT xiuguoliu efficientgroupbasedreplicaplacementpolicyforlargescalegeospatial3drasterdataonhadoop AT dongliang efficientgroupbasedreplicaplacementpolicyforlargescalegeospatial3drasterdataonhadoop AT yabozhao efficientgroupbasedreplicaplacementpolicyforlargescalegeospatial3drasterdataonhadoop AT manxingshi efficientgroupbasedreplicaplacementpolicyforlargescalegeospatial3drasterdataonhadoop

An Efficient Group-Based Replica Placement Policy for Large-Scale Geospatial 3D Raster Data on Hadoop

Similar Items