Parallel Processing Strategies for Geospatial Data in a Cloud Computing Infrastructure
This paper is on the optimization of computing resources to process geospatial image data in a cloud computing infrastructure. Parallelization was tested by combining two different strategies: image tiling and multi-threading. The objective here was to get insight on the optimal use of available pro...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-01-01
|
Series: | Remote Sensing |
Subjects: | |
Online Access: | https://www.mdpi.com/2072-4292/14/2/398 |
_version_ | 1797490588525789184 |
---|---|
author | Pieter Kempeneers Tomas Kliment Luca Marletta Pierre Soille |
author_facet | Pieter Kempeneers Tomas Kliment Luca Marletta Pierre Soille |
author_sort | Pieter Kempeneers |
collection | DOAJ |
description | This paper is on the optimization of computing resources to process geospatial image data in a cloud computing infrastructure. Parallelization was tested by combining two different strategies: image tiling and multi-threading. The objective here was to get insight on the optimal use of available processing resources in order to minimize the processing time. Maximum speedup was obtained when combining tiling and multi-threading techniques. Both techniques are complementary, but a trade-off also exists. Speedup is improved with tiling, as parts of the image can run in parallel. But reading part of the image introduces an overhead and increases the relative part of the program that can only run in serial. This limits speedup that can be achieved via multi-threading. The optimal strategy of tiling and multi-threading that maximizes speedup depends on the scale of the application (global or local processing area), the implementation of the algorithm (processing libraries), and on the available computing resources (amount of memory and cores). A medium-sized virtual server that has been obtained from a cloud service provider has rather limited computing resources. Tiling will not only improve speedup but can be necessary to reduce the memory footprint. However, a tiling scheme with many small tiles increases overhead and can introduce extra latency due to queued tiles that are waiting to be processed. In a high-throughput computing cluster with hundreds of physical processing cores, more tiles can be processed in parallel, and the optimal strategy will be different. A quantitative assessment of the speedup was performed in this study, based on a number of experiments for different computing environments. The potential and limitations of parallel processing by tiling and multi-threading were hereby assessed. Experiments were based on an implementation that relies on an application programming interface (API) abstracting any platform-specific details, such as those related to data access. |
first_indexed | 2024-03-10T00:35:08Z |
format | Article |
id | doaj.art-28e888dc32e34fcdac40a5de088fa5ed |
institution | Directory Open Access Journal |
issn | 2072-4292 |
language | English |
last_indexed | 2024-03-10T00:35:08Z |
publishDate | 2022-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Remote Sensing |
spelling | doaj.art-28e888dc32e34fcdac40a5de088fa5ed2023-11-23T15:17:06ZengMDPI AGRemote Sensing2072-42922022-01-0114239810.3390/rs14020398Parallel Processing Strategies for Geospatial Data in a Cloud Computing InfrastructurePieter Kempeneers0Tomas Kliment1Luca Marletta2Pierre Soille3European Commission, Joint Research Centre (JRC), 21027 Ispra, ItalyArhs Developments, 4370 Esch-sur-Alzette, LuxembourgArhs Developments, 4370 Esch-sur-Alzette, LuxembourgEuropean Commission, Joint Research Centre (JRC), 21027 Ispra, ItalyThis paper is on the optimization of computing resources to process geospatial image data in a cloud computing infrastructure. Parallelization was tested by combining two different strategies: image tiling and multi-threading. The objective here was to get insight on the optimal use of available processing resources in order to minimize the processing time. Maximum speedup was obtained when combining tiling and multi-threading techniques. Both techniques are complementary, but a trade-off also exists. Speedup is improved with tiling, as parts of the image can run in parallel. But reading part of the image introduces an overhead and increases the relative part of the program that can only run in serial. This limits speedup that can be achieved via multi-threading. The optimal strategy of tiling and multi-threading that maximizes speedup depends on the scale of the application (global or local processing area), the implementation of the algorithm (processing libraries), and on the available computing resources (amount of memory and cores). A medium-sized virtual server that has been obtained from a cloud service provider has rather limited computing resources. Tiling will not only improve speedup but can be necessary to reduce the memory footprint. However, a tiling scheme with many small tiles increases overhead and can introduce extra latency due to queued tiles that are waiting to be processed. In a high-throughput computing cluster with hundreds of physical processing cores, more tiles can be processed in parallel, and the optimal strategy will be different. A quantitative assessment of the speedup was performed in this study, based on a number of experiments for different computing environments. The potential and limitations of parallel processing by tiling and multi-threading were hereby assessed. Experiments were based on an implementation that relies on an application programming interface (API) abstracting any platform-specific details, such as those related to data access.https://www.mdpi.com/2072-4292/14/2/398high-throughput computingcloud computingsatellite image processingopenEO |
spellingShingle | Pieter Kempeneers Tomas Kliment Luca Marletta Pierre Soille Parallel Processing Strategies for Geospatial Data in a Cloud Computing Infrastructure Remote Sensing high-throughput computing cloud computing satellite image processing openEO |
title | Parallel Processing Strategies for Geospatial Data in a Cloud Computing Infrastructure |
title_full | Parallel Processing Strategies for Geospatial Data in a Cloud Computing Infrastructure |
title_fullStr | Parallel Processing Strategies for Geospatial Data in a Cloud Computing Infrastructure |
title_full_unstemmed | Parallel Processing Strategies for Geospatial Data in a Cloud Computing Infrastructure |
title_short | Parallel Processing Strategies for Geospatial Data in a Cloud Computing Infrastructure |
title_sort | parallel processing strategies for geospatial data in a cloud computing infrastructure |
topic | high-throughput computing cloud computing satellite image processing openEO |
url | https://www.mdpi.com/2072-4292/14/2/398 |
work_keys_str_mv | AT pieterkempeneers parallelprocessingstrategiesforgeospatialdatainacloudcomputinginfrastructure AT tomaskliment parallelprocessingstrategiesforgeospatialdatainacloudcomputinginfrastructure AT lucamarletta parallelprocessingstrategiesforgeospatialdatainacloudcomputinginfrastructure AT pierresoille parallelprocessingstrategiesforgeospatialdatainacloudcomputinginfrastructure |