ReS<sup>2</sup>tAC—UAV-Borne Real-Time SGM Stereo Optimized for Embedded ARM and CUDA Devices

With the emergence of low-cost robotic systems, such as unmanned aerial vehicle, the importance of embedded high-performance image processing has increased. For a long time, FPGAs were the only processing hardware that were capable of high-performance computing, while at the same time preserving a l...

Full description

Bibliographic Details
Main Authors: Boitumelo Ruf, Jonas Mohrs, Martin Weinmann, Stefan Hinz, Jürgen Beyerer
Format: Article
Language:English
Published: MDPI AG 2021-06-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/21/11/3938
_version_ 1797531004252979200
author Boitumelo Ruf
Jonas Mohrs
Martin Weinmann
Stefan Hinz
Jürgen Beyerer
author_facet Boitumelo Ruf
Jonas Mohrs
Martin Weinmann
Stefan Hinz
Jürgen Beyerer
author_sort Boitumelo Ruf
collection DOAJ
description With the emergence of low-cost robotic systems, such as unmanned aerial vehicle, the importance of embedded high-performance image processing has increased. For a long time, FPGAs were the only processing hardware that were capable of high-performance computing, while at the same time preserving a low power consumption, essential for embedded systems. However, the recently increasing availability of embedded GPU-based systems, such as the NVIDIA Jetson series, comprised of an ARM CPU and a NVIDIA Tegra GPU, allows for massively parallel embedded computing on graphics hardware. With this in mind, we propose an approach for real-time embedded stereo processing on ARM and CUDA-enabled devices, which is based on the popular and widely used Semi-Global Matching algorithm. In this, we propose an optimization of the algorithm for embedded CUDA GPUs, by using massively parallel computing, as well as using the NEON intrinsics to optimize the algorithm for vectorized SIMD processing on embedded ARM CPUs. We have evaluated our approach with different configurations on two public stereo benchmark datasets to demonstrate that they can reach an error rate as low as 3.3%. Furthermore, our experiments show that the fastest configuration of our approach reaches up to 46 FPS on VGA image resolution. Finally, in a use-case specific qualitative evaluation, we have evaluated the power consumption of our approach and deployed it on the DJI Manifold 2-G attached to a DJI Matrix 210v2 RTK unmanned aerial vehicle (UAV), demonstrating its suitability for real-time stereo processing onboard a UAV.
first_indexed 2024-03-10T10:37:56Z
format Article
id doaj.art-4938b5d59a0a4e45928a6a461b1d86d7
institution Directory Open Access Journal
issn 1424-8220
language English
last_indexed 2024-03-10T10:37:56Z
publishDate 2021-06-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj.art-4938b5d59a0a4e45928a6a461b1d86d72023-11-21T23:09:53ZengMDPI AGSensors1424-82202021-06-012111393810.3390/s21113938ReS<sup>2</sup>tAC—UAV-Borne Real-Time SGM Stereo Optimized for Embedded ARM and CUDA DevicesBoitumelo Ruf0Jonas Mohrs1Martin Weinmann2Stefan Hinz3Jürgen Beyerer4Fraunhofer Center for Machine Learning, Fraunhofer Institute of Optronics, System Technologies and Image Exploitation (IOSB), 76131 Karlsruhe, GermanyFraunhofer Center for Machine Learning, Fraunhofer Institute of Optronics, System Technologies and Image Exploitation (IOSB), 76131 Karlsruhe, GermanyInstitute of Photogrammetry and Remote Sensing, Karlsruhe Institute of Technology (KIT), 76131 Karlsruhe, GermanyInstitute of Photogrammetry and Remote Sensing, Karlsruhe Institute of Technology (KIT), 76131 Karlsruhe, GermanyFraunhofer Center for Machine Learning, Fraunhofer Institute of Optronics, System Technologies and Image Exploitation (IOSB), 76131 Karlsruhe, GermanyWith the emergence of low-cost robotic systems, such as unmanned aerial vehicle, the importance of embedded high-performance image processing has increased. For a long time, FPGAs were the only processing hardware that were capable of high-performance computing, while at the same time preserving a low power consumption, essential for embedded systems. However, the recently increasing availability of embedded GPU-based systems, such as the NVIDIA Jetson series, comprised of an ARM CPU and a NVIDIA Tegra GPU, allows for massively parallel embedded computing on graphics hardware. With this in mind, we propose an approach for real-time embedded stereo processing on ARM and CUDA-enabled devices, which is based on the popular and widely used Semi-Global Matching algorithm. In this, we propose an optimization of the algorithm for embedded CUDA GPUs, by using massively parallel computing, as well as using the NEON intrinsics to optimize the algorithm for vectorized SIMD processing on embedded ARM CPUs. We have evaluated our approach with different configurations on two public stereo benchmark datasets to demonstrate that they can reach an error rate as low as 3.3%. Furthermore, our experiments show that the fastest configuration of our approach reaches up to 46 FPS on VGA image resolution. Finally, in a use-case specific qualitative evaluation, we have evaluated the power consumption of our approach and deployed it on the DJI Manifold 2-G attached to a DJI Matrix 210v2 RTK unmanned aerial vehicle (UAV), demonstrating its suitability for real-time stereo processing onboard a UAV.https://www.mdpi.com/1424-8220/21/11/3938embedded stereo visionreal-time stereo processingdisparity estimationsemi-global matchingGPGPUSIMD
spellingShingle Boitumelo Ruf
Jonas Mohrs
Martin Weinmann
Stefan Hinz
Jürgen Beyerer
ReS<sup>2</sup>tAC—UAV-Borne Real-Time SGM Stereo Optimized for Embedded ARM and CUDA Devices
Sensors
embedded stereo vision
real-time stereo processing
disparity estimation
semi-global matching
GPGPU
SIMD
title ReS<sup>2</sup>tAC—UAV-Borne Real-Time SGM Stereo Optimized for Embedded ARM and CUDA Devices
title_full ReS<sup>2</sup>tAC—UAV-Borne Real-Time SGM Stereo Optimized for Embedded ARM and CUDA Devices
title_fullStr ReS<sup>2</sup>tAC—UAV-Borne Real-Time SGM Stereo Optimized for Embedded ARM and CUDA Devices
title_full_unstemmed ReS<sup>2</sup>tAC—UAV-Borne Real-Time SGM Stereo Optimized for Embedded ARM and CUDA Devices
title_short ReS<sup>2</sup>tAC—UAV-Borne Real-Time SGM Stereo Optimized for Embedded ARM and CUDA Devices
title_sort res sup 2 sup tac uav borne real time sgm stereo optimized for embedded arm and cuda devices
topic embedded stereo vision
real-time stereo processing
disparity estimation
semi-global matching
GPGPU
SIMD
url https://www.mdpi.com/1424-8220/21/11/3938
work_keys_str_mv AT boitumeloruf ressup2suptacuavbornerealtimesgmstereooptimizedforembeddedarmandcudadevices
AT jonasmohrs ressup2suptacuavbornerealtimesgmstereooptimizedforembeddedarmandcudadevices
AT martinweinmann ressup2suptacuavbornerealtimesgmstereooptimizedforembeddedarmandcudadevices
AT stefanhinz ressup2suptacuavbornerealtimesgmstereooptimizedforembeddedarmandcudadevices
AT jurgenbeyerer ressup2suptacuavbornerealtimesgmstereooptimizedforembeddedarmandcudadevices