Parallel Processing Strategies for Geospatial Data in a Cloud Computing Infrastructure

This paper is on the optimization of computing resources to process geospatial image data in a cloud computing infrastructure. Parallelization was tested by combining two different strategies: image tiling and multi-threading. The objective here was to get insight on the optimal use of available pro...

Full description

Bibliographic Details
Main Authors: Pieter Kempeneers, Tomas Kliment, Luca Marletta, Pierre Soille
Format: Article
Language:English
Published: MDPI AG 2022-01-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/14/2/398
_version_ 1797490588525789184
author Pieter Kempeneers
Tomas Kliment
Luca Marletta
Pierre Soille
author_facet Pieter Kempeneers
Tomas Kliment
Luca Marletta
Pierre Soille
author_sort Pieter Kempeneers
collection DOAJ
description This paper is on the optimization of computing resources to process geospatial image data in a cloud computing infrastructure. Parallelization was tested by combining two different strategies: image tiling and multi-threading. The objective here was to get insight on the optimal use of available processing resources in order to minimize the processing time. Maximum speedup was obtained when combining tiling and multi-threading techniques. Both techniques are complementary, but a trade-off also exists. Speedup is improved with tiling, as parts of the image can run in parallel. But reading part of the image introduces an overhead and increases the relative part of the program that can only run in serial. This limits speedup that can be achieved via multi-threading. The optimal strategy of tiling and multi-threading that maximizes speedup depends on the scale of the application (global or local processing area), the implementation of the algorithm (processing libraries), and on the available computing resources (amount of memory and cores). A medium-sized virtual server that has been obtained from a cloud service provider has rather limited computing resources. Tiling will not only improve speedup but can be necessary to reduce the memory footprint. However, a tiling scheme with many small tiles increases overhead and can introduce extra latency due to queued tiles that are waiting to be processed. In a high-throughput computing cluster with hundreds of physical processing cores, more tiles can be processed in parallel, and the optimal strategy will be different. A quantitative assessment of the speedup was performed in this study, based on a number of experiments for different computing environments. The potential and limitations of parallel processing by tiling and multi-threading were hereby assessed. Experiments were based on an implementation that relies on an application programming interface (API) abstracting any platform-specific details, such as those related to data access.
first_indexed 2024-03-10T00:35:08Z
format Article
id doaj.art-28e888dc32e34fcdac40a5de088fa5ed
institution Directory Open Access Journal
issn 2072-4292
language English
last_indexed 2024-03-10T00:35:08Z
publishDate 2022-01-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj.art-28e888dc32e34fcdac40a5de088fa5ed2023-11-23T15:17:06ZengMDPI AGRemote Sensing2072-42922022-01-0114239810.3390/rs14020398Parallel Processing Strategies for Geospatial Data in a Cloud Computing InfrastructurePieter Kempeneers0Tomas Kliment1Luca Marletta2Pierre Soille3European Commission, Joint Research Centre (JRC), 21027 Ispra, ItalyArhs Developments, 4370 Esch-sur-Alzette, LuxembourgArhs Developments, 4370 Esch-sur-Alzette, LuxembourgEuropean Commission, Joint Research Centre (JRC), 21027 Ispra, ItalyThis paper is on the optimization of computing resources to process geospatial image data in a cloud computing infrastructure. Parallelization was tested by combining two different strategies: image tiling and multi-threading. The objective here was to get insight on the optimal use of available processing resources in order to minimize the processing time. Maximum speedup was obtained when combining tiling and multi-threading techniques. Both techniques are complementary, but a trade-off also exists. Speedup is improved with tiling, as parts of the image can run in parallel. But reading part of the image introduces an overhead and increases the relative part of the program that can only run in serial. This limits speedup that can be achieved via multi-threading. The optimal strategy of tiling and multi-threading that maximizes speedup depends on the scale of the application (global or local processing area), the implementation of the algorithm (processing libraries), and on the available computing resources (amount of memory and cores). A medium-sized virtual server that has been obtained from a cloud service provider has rather limited computing resources. Tiling will not only improve speedup but can be necessary to reduce the memory footprint. However, a tiling scheme with many small tiles increases overhead and can introduce extra latency due to queued tiles that are waiting to be processed. In a high-throughput computing cluster with hundreds of physical processing cores, more tiles can be processed in parallel, and the optimal strategy will be different. A quantitative assessment of the speedup was performed in this study, based on a number of experiments for different computing environments. The potential and limitations of parallel processing by tiling and multi-threading were hereby assessed. Experiments were based on an implementation that relies on an application programming interface (API) abstracting any platform-specific details, such as those related to data access.https://www.mdpi.com/2072-4292/14/2/398high-throughput computingcloud computingsatellite image processingopenEO
spellingShingle Pieter Kempeneers
Tomas Kliment
Luca Marletta
Pierre Soille
Parallel Processing Strategies for Geospatial Data in a Cloud Computing Infrastructure
Remote Sensing
high-throughput computing
cloud computing
satellite image processing
openEO
title Parallel Processing Strategies for Geospatial Data in a Cloud Computing Infrastructure
title_full Parallel Processing Strategies for Geospatial Data in a Cloud Computing Infrastructure
title_fullStr Parallel Processing Strategies for Geospatial Data in a Cloud Computing Infrastructure
title_full_unstemmed Parallel Processing Strategies for Geospatial Data in a Cloud Computing Infrastructure
title_short Parallel Processing Strategies for Geospatial Data in a Cloud Computing Infrastructure
title_sort parallel processing strategies for geospatial data in a cloud computing infrastructure
topic high-throughput computing
cloud computing
satellite image processing
openEO
url https://www.mdpi.com/2072-4292/14/2/398
work_keys_str_mv AT pieterkempeneers parallelprocessingstrategiesforgeospatialdatainacloudcomputinginfrastructure
AT tomaskliment parallelprocessingstrategiesforgeospatialdatainacloudcomputinginfrastructure
AT lucamarletta parallelprocessingstrategiesforgeospatialdatainacloudcomputinginfrastructure
AT pierresoille parallelprocessingstrategiesforgeospatialdatainacloudcomputinginfrastructure