High-Level Synthesis for Semi-Global Matching: Is the Juice Worth the Squeeze?

High-level synthesis (HLS)-based design methodologies are extremely viable for industries that are sensitive to production costs. In order to have competitive advantage, the ability to have several different implementations of the same algorithm satisfying a diverse range of resolution, cost, and pe...

Full description

Bibliographic Details
Main Authors:	Affaq Qamar, Fahad Bin Muslim, Francesco Gregoretti, Luciano Lavagno, Mihai Teodor Lazarescu
Format:	Article
Language:	English
Published:	IEEE 2017-01-01
Series:	IEEE Access
Subjects:	High-level synthesis FPGA register transfer level (RTL) semi-global matching DRAM design space exploration
Online Access:	https://ieeexplore.ieee.org/document/7769188/

_version_	1818370014093246464
author	Affaq Qamar Fahad Bin Muslim Francesco Gregoretti Luciano Lavagno Mihai Teodor Lazarescu
author_facet	Affaq Qamar Fahad Bin Muslim Francesco Gregoretti Luciano Lavagno Mihai Teodor Lazarescu
author_sort	Affaq Qamar
collection	DOAJ
description	High-level synthesis (HLS)-based design methodologies are extremely viable for industries that are sensitive to production costs. In order to have competitive advantage, the ability to have several different implementations of the same algorithm satisfying a diverse range of resolution, cost, and performance constraints is highly desirable. In this paper, we present multiple hardware implementations of the semi-global matching (SGM) algorithm, which is used in stereo vision systems, e.g., for automotive applications. The hardware platform considered in this paper is a Xilinx Zynq system-on-chip. A performance comparison of both HLS-based design and a manual register transfer level (RTL) design in terms of quality of results, flexibility, and design time is also presented. SGM mainly includes a sequence of three processing steps, i.e., the “cost cube calculation” followed by the “path cost computation” and finally the “disparity approximation and minimization”. The path cost processor further performs a pixel-wise processing of the cost cube data along eight distinct path orientations. The baseline algorithmic model usually called the “golden” model utilizes considerably large arrays that are required to be mapped to an external DRAM and brought into the on-chip RAM when required. This necessitates adding both the memory transfer loops as well as insertion of calls to the AXI transactors for accessing the DRAM through the on-chip DDR slave. Furthermore, the initial algorithm (typically single-threaded) must be parallelized to fully exploit the concurrency offered by the target hardware platform. The design space exploration was thus performed by making several considerably different micro-architectural choices. Eventually, we were able to obtain an implementation comparable with the manual RTL design. Both the manual RTL and the HLS designs achieved the target real-time performance of 30 frames/s for the image resolution of 640×480 with a disparity depth of 128 pixels per frame.
first_indexed	2024-12-13T23:33:00Z
format	Article
id	doaj.art-0e6d17dab5404794b0e302d81efe6d8f
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-12-13T23:33:00Z
publishDate	2017-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-0e6d17dab5404794b0e302d81efe6d8f2022-12-21T23:27:22ZengIEEEIEEE Access2169-35362017-01-0158419843210.1109/ACCESS.2016.26353787769188High-Level Synthesis for Semi-Global Matching: Is the Juice Worth the Squeeze?Affaq Qamar0https://orcid.org/0000-0002-4350-4677Fahad Bin Muslim1https://orcid.org/0000-0002-4153-360XFrancesco Gregoretti2Luciano Lavagno3Mihai Teodor Lazarescu4Department of Electrical Engineering, Abasyn University, Peshawar, PakistanDepartment of Electronics and Telecommunications, Politecnico di Torino, Turin, ItalyDepartment of Electronics and Telecommunications, Politecnico di Torino, Turin, ItalyDepartment of Electronics and Telecommunications, Politecnico di Torino, Turin, ItalyDepartment of Electronics and Telecommunications, Politecnico di Torino, Turin, ItalyHigh-level synthesis (HLS)-based design methodologies are extremely viable for industries that are sensitive to production costs. In order to have competitive advantage, the ability to have several different implementations of the same algorithm satisfying a diverse range of resolution, cost, and performance constraints is highly desirable. In this paper, we present multiple hardware implementations of the semi-global matching (SGM) algorithm, which is used in stereo vision systems, e.g., for automotive applications. The hardware platform considered in this paper is a Xilinx Zynq system-on-chip. A performance comparison of both HLS-based design and a manual register transfer level (RTL) design in terms of quality of results, flexibility, and design time is also presented. SGM mainly includes a sequence of three processing steps, i.e., the “cost cube calculation” followed by the “path cost computation” and finally the “disparity approximation and minimization”. The path cost processor further performs a pixel-wise processing of the cost cube data along eight distinct path orientations. The baseline algorithmic model usually called the “golden” model utilizes considerably large arrays that are required to be mapped to an external DRAM and brought into the on-chip RAM when required. This necessitates adding both the memory transfer loops as well as insertion of calls to the AXI transactors for accessing the DRAM through the on-chip DDR slave. Furthermore, the initial algorithm (typically single-threaded) must be parallelized to fully exploit the concurrency offered by the target hardware platform. The design space exploration was thus performed by making several considerably different micro-architectural choices. Eventually, we were able to obtain an implementation comparable with the manual RTL design. Both the manual RTL and the HLS designs achieved the target real-time performance of 30 frames/s for the image resolution of 640×480 with a disparity depth of 128 pixels per frame.https://ieeexplore.ieee.org/document/7769188/High-level synthesisFPGAregister transfer level (RTL)semi-global matchingDRAMdesign space exploration
spellingShingle	Affaq Qamar Fahad Bin Muslim Francesco Gregoretti Luciano Lavagno Mihai Teodor Lazarescu High-Level Synthesis for Semi-Global Matching: Is the Juice Worth the Squeeze? IEEE Access High-level synthesis FPGA register transfer level (RTL) semi-global matching DRAM design space exploration
title	High-Level Synthesis for Semi-Global Matching: Is the Juice Worth the Squeeze?
title_full	High-Level Synthesis for Semi-Global Matching: Is the Juice Worth the Squeeze?
title_fullStr	High-Level Synthesis for Semi-Global Matching: Is the Juice Worth the Squeeze?
title_full_unstemmed	High-Level Synthesis for Semi-Global Matching: Is the Juice Worth the Squeeze?
title_short	High-Level Synthesis for Semi-Global Matching: Is the Juice Worth the Squeeze?
title_sort	high level synthesis for semi global matching is the juice worth the squeeze
topic	High-level synthesis FPGA register transfer level (RTL) semi-global matching DRAM design space exploration
url	https://ieeexplore.ieee.org/document/7769188/
work_keys_str_mv	AT affaqqamar highlevelsynthesisforsemiglobalmatchingisthejuiceworththesqueeze AT fahadbinmuslim highlevelsynthesisforsemiglobalmatchingisthejuiceworththesqueeze AT francescogregoretti highlevelsynthesisforsemiglobalmatchingisthejuiceworththesqueeze AT lucianolavagno highlevelsynthesisforsemiglobalmatchingisthejuiceworththesqueeze AT mihaiteodorlazarescu highlevelsynthesisforsemiglobalmatchingisthejuiceworththesqueeze

High-Level Synthesis for Semi-Global Matching: Is the Juice Worth the Squeeze?

Similar Items