High-Level Synthesis for Semi-Global Matching: Is the Juice Worth the Squeeze?

High-level synthesis (HLS)-based design methodologies are extremely viable for industries that are sensitive to production costs. In order to have competitive advantage, the ability to have several different implementations of the same algorithm satisfying a diverse range of resolution, cost, and pe...

Full description

Bibliographic Details
Main Authors: Affaq Qamar, Fahad Bin Muslim, Francesco Gregoretti, Luciano Lavagno, Mihai Teodor Lazarescu
Format: Article
Language:English
Published: IEEE 2017-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/7769188/
_version_ 1818370014093246464
author Affaq Qamar
Fahad Bin Muslim
Francesco Gregoretti
Luciano Lavagno
Mihai Teodor Lazarescu
author_facet Affaq Qamar
Fahad Bin Muslim
Francesco Gregoretti
Luciano Lavagno
Mihai Teodor Lazarescu
author_sort Affaq Qamar
collection DOAJ
description High-level synthesis (HLS)-based design methodologies are extremely viable for industries that are sensitive to production costs. In order to have competitive advantage, the ability to have several different implementations of the same algorithm satisfying a diverse range of resolution, cost, and performance constraints is highly desirable. In this paper, we present multiple hardware implementations of the semi-global matching (SGM) algorithm, which is used in stereo vision systems, e.g., for automotive applications. The hardware platform considered in this paper is a Xilinx Zynq system-on-chip. A performance comparison of both HLS-based design and a manual register transfer level (RTL) design in terms of quality of results, flexibility, and design time is also presented. SGM mainly includes a sequence of three processing steps, i.e., the “cost cube calculation” followed by the “path cost computation” and finally the “disparity approximation and minimization”. The path cost processor further performs a pixel-wise processing of the cost cube data along eight distinct path orientations. The baseline algorithmic model usually called the “golden” model utilizes considerably large arrays that are required to be mapped to an external DRAM and brought into the on-chip RAM when required. This necessitates adding both the memory transfer loops as well as insertion of calls to the AXI transactors for accessing the DRAM through the on-chip DDR slave. Furthermore, the initial algorithm (typically single-threaded) must be parallelized to fully exploit the concurrency offered by the target hardware platform. The design space exploration was thus performed by making several considerably different micro-architectural choices. Eventually, we were able to obtain an implementation comparable with the manual RTL design. Both the manual RTL and the HLS designs achieved the target real-time performance of 30 frames/s for the image resolution of 640×480 with a disparity depth of 128 pixels per frame.
first_indexed 2024-12-13T23:33:00Z
format Article
id doaj.art-0e6d17dab5404794b0e302d81efe6d8f
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-13T23:33:00Z
publishDate 2017-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-0e6d17dab5404794b0e302d81efe6d8f2022-12-21T23:27:22ZengIEEEIEEE Access2169-35362017-01-0158419843210.1109/ACCESS.2016.26353787769188High-Level Synthesis for Semi-Global Matching: Is the Juice Worth the Squeeze?Affaq Qamar0https://orcid.org/0000-0002-4350-4677Fahad Bin Muslim1https://orcid.org/0000-0002-4153-360XFrancesco Gregoretti2Luciano Lavagno3Mihai Teodor Lazarescu4Department of Electrical Engineering, Abasyn University, Peshawar, PakistanDepartment of Electronics and Telecommunications, Politecnico di Torino, Turin, ItalyDepartment of Electronics and Telecommunications, Politecnico di Torino, Turin, ItalyDepartment of Electronics and Telecommunications, Politecnico di Torino, Turin, ItalyDepartment of Electronics and Telecommunications, Politecnico di Torino, Turin, ItalyHigh-level synthesis (HLS)-based design methodologies are extremely viable for industries that are sensitive to production costs. In order to have competitive advantage, the ability to have several different implementations of the same algorithm satisfying a diverse range of resolution, cost, and performance constraints is highly desirable. In this paper, we present multiple hardware implementations of the semi-global matching (SGM) algorithm, which is used in stereo vision systems, e.g., for automotive applications. The hardware platform considered in this paper is a Xilinx Zynq system-on-chip. A performance comparison of both HLS-based design and a manual register transfer level (RTL) design in terms of quality of results, flexibility, and design time is also presented. SGM mainly includes a sequence of three processing steps, i.e., the “cost cube calculation” followed by the “path cost computation” and finally the “disparity approximation and minimization”. The path cost processor further performs a pixel-wise processing of the cost cube data along eight distinct path orientations. The baseline algorithmic model usually called the “golden” model utilizes considerably large arrays that are required to be mapped to an external DRAM and brought into the on-chip RAM when required. This necessitates adding both the memory transfer loops as well as insertion of calls to the AXI transactors for accessing the DRAM through the on-chip DDR slave. Furthermore, the initial algorithm (typically single-threaded) must be parallelized to fully exploit the concurrency offered by the target hardware platform. The design space exploration was thus performed by making several considerably different micro-architectural choices. Eventually, we were able to obtain an implementation comparable with the manual RTL design. Both the manual RTL and the HLS designs achieved the target real-time performance of 30 frames/s for the image resolution of 640×480 with a disparity depth of 128 pixels per frame.https://ieeexplore.ieee.org/document/7769188/High-level synthesisFPGAregister transfer level (RTL)semi-global matchingDRAMdesign space exploration
spellingShingle Affaq Qamar
Fahad Bin Muslim
Francesco Gregoretti
Luciano Lavagno
Mihai Teodor Lazarescu
High-Level Synthesis for Semi-Global Matching: Is the Juice Worth the Squeeze?
IEEE Access
High-level synthesis
FPGA
register transfer level (RTL)
semi-global matching
DRAM
design space exploration
title High-Level Synthesis for Semi-Global Matching: Is the Juice Worth the Squeeze?
title_full High-Level Synthesis for Semi-Global Matching: Is the Juice Worth the Squeeze?
title_fullStr High-Level Synthesis for Semi-Global Matching: Is the Juice Worth the Squeeze?
title_full_unstemmed High-Level Synthesis for Semi-Global Matching: Is the Juice Worth the Squeeze?
title_short High-Level Synthesis for Semi-Global Matching: Is the Juice Worth the Squeeze?
title_sort high level synthesis for semi global matching is the juice worth the squeeze
topic High-level synthesis
FPGA
register transfer level (RTL)
semi-global matching
DRAM
design space exploration
url https://ieeexplore.ieee.org/document/7769188/
work_keys_str_mv AT affaqqamar highlevelsynthesisforsemiglobalmatchingisthejuiceworththesqueeze
AT fahadbinmuslim highlevelsynthesisforsemiglobalmatchingisthejuiceworththesqueeze
AT francescogregoretti highlevelsynthesisforsemiglobalmatchingisthejuiceworththesqueeze
AT lucianolavagno highlevelsynthesisforsemiglobalmatchingisthejuiceworththesqueeze
AT mihaiteodorlazarescu highlevelsynthesisforsemiglobalmatchingisthejuiceworththesqueeze