High-Level Synthesis for Semi-Global Matching: Is the Juice Worth the Squeeze?
High-level synthesis (HLS)-based design methodologies are extremely viable for industries that are sensitive to production costs. In order to have competitive advantage, the ability to have several different implementations of the same algorithm satisfying a diverse range of resolution, cost, and pe...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2017-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/7769188/ |
_version_ | 1818370014093246464 |
---|---|
author | Affaq Qamar Fahad Bin Muslim Francesco Gregoretti Luciano Lavagno Mihai Teodor Lazarescu |
author_facet | Affaq Qamar Fahad Bin Muslim Francesco Gregoretti Luciano Lavagno Mihai Teodor Lazarescu |
author_sort | Affaq Qamar |
collection | DOAJ |
description | High-level synthesis (HLS)-based design methodologies are extremely viable for industries that are sensitive to production costs. In order to have competitive advantage, the ability to have several different implementations of the same algorithm satisfying a diverse range of resolution, cost, and performance constraints is highly desirable. In this paper, we present multiple hardware implementations of the semi-global matching (SGM) algorithm, which is used in stereo vision systems, e.g., for automotive applications. The hardware platform considered in this paper is a Xilinx Zynq system-on-chip. A performance comparison of both HLS-based design and a manual register transfer level (RTL) design in terms of quality of results, flexibility, and design time is also presented. SGM mainly includes a sequence of three processing steps, i.e., the “cost cube calculation” followed by the “path cost computation” and finally the “disparity approximation and minimization”. The path cost processor further performs a pixel-wise processing of the cost cube data along eight distinct path orientations. The baseline algorithmic model usually called the “golden” model utilizes considerably large arrays that are required to be mapped to an external DRAM and brought into the on-chip RAM when required. This necessitates adding both the memory transfer loops as well as insertion of calls to the AXI transactors for accessing the DRAM through the on-chip DDR slave. Furthermore, the initial algorithm (typically single-threaded) must be parallelized to fully exploit the concurrency offered by the target hardware platform. The design space exploration was thus performed by making several considerably different micro-architectural choices. Eventually, we were able to obtain an implementation comparable with the manual RTL design. Both the manual RTL and the HLS designs achieved the target real-time performance of 30 frames/s for the image resolution of 640×480 with a disparity depth of 128 pixels per frame. |
first_indexed | 2024-12-13T23:33:00Z |
format | Article |
id | doaj.art-0e6d17dab5404794b0e302d81efe6d8f |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-13T23:33:00Z |
publishDate | 2017-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-0e6d17dab5404794b0e302d81efe6d8f2022-12-21T23:27:22ZengIEEEIEEE Access2169-35362017-01-0158419843210.1109/ACCESS.2016.26353787769188High-Level Synthesis for Semi-Global Matching: Is the Juice Worth the Squeeze?Affaq Qamar0https://orcid.org/0000-0002-4350-4677Fahad Bin Muslim1https://orcid.org/0000-0002-4153-360XFrancesco Gregoretti2Luciano Lavagno3Mihai Teodor Lazarescu4Department of Electrical Engineering, Abasyn University, Peshawar, PakistanDepartment of Electronics and Telecommunications, Politecnico di Torino, Turin, ItalyDepartment of Electronics and Telecommunications, Politecnico di Torino, Turin, ItalyDepartment of Electronics and Telecommunications, Politecnico di Torino, Turin, ItalyDepartment of Electronics and Telecommunications, Politecnico di Torino, Turin, ItalyHigh-level synthesis (HLS)-based design methodologies are extremely viable for industries that are sensitive to production costs. In order to have competitive advantage, the ability to have several different implementations of the same algorithm satisfying a diverse range of resolution, cost, and performance constraints is highly desirable. In this paper, we present multiple hardware implementations of the semi-global matching (SGM) algorithm, which is used in stereo vision systems, e.g., for automotive applications. The hardware platform considered in this paper is a Xilinx Zynq system-on-chip. A performance comparison of both HLS-based design and a manual register transfer level (RTL) design in terms of quality of results, flexibility, and design time is also presented. SGM mainly includes a sequence of three processing steps, i.e., the “cost cube calculation” followed by the “path cost computation” and finally the “disparity approximation and minimization”. The path cost processor further performs a pixel-wise processing of the cost cube data along eight distinct path orientations. The baseline algorithmic model usually called the “golden” model utilizes considerably large arrays that are required to be mapped to an external DRAM and brought into the on-chip RAM when required. This necessitates adding both the memory transfer loops as well as insertion of calls to the AXI transactors for accessing the DRAM through the on-chip DDR slave. Furthermore, the initial algorithm (typically single-threaded) must be parallelized to fully exploit the concurrency offered by the target hardware platform. The design space exploration was thus performed by making several considerably different micro-architectural choices. Eventually, we were able to obtain an implementation comparable with the manual RTL design. Both the manual RTL and the HLS designs achieved the target real-time performance of 30 frames/s for the image resolution of 640×480 with a disparity depth of 128 pixels per frame.https://ieeexplore.ieee.org/document/7769188/High-level synthesisFPGAregister transfer level (RTL)semi-global matchingDRAMdesign space exploration |
spellingShingle | Affaq Qamar Fahad Bin Muslim Francesco Gregoretti Luciano Lavagno Mihai Teodor Lazarescu High-Level Synthesis for Semi-Global Matching: Is the Juice Worth the Squeeze? IEEE Access High-level synthesis FPGA register transfer level (RTL) semi-global matching DRAM design space exploration |
title | High-Level Synthesis for Semi-Global Matching: Is the Juice Worth the Squeeze? |
title_full | High-Level Synthesis for Semi-Global Matching: Is the Juice Worth the Squeeze? |
title_fullStr | High-Level Synthesis for Semi-Global Matching: Is the Juice Worth the Squeeze? |
title_full_unstemmed | High-Level Synthesis for Semi-Global Matching: Is the Juice Worth the Squeeze? |
title_short | High-Level Synthesis for Semi-Global Matching: Is the Juice Worth the Squeeze? |
title_sort | high level synthesis for semi global matching is the juice worth the squeeze |
topic | High-level synthesis FPGA register transfer level (RTL) semi-global matching DRAM design space exploration |
url | https://ieeexplore.ieee.org/document/7769188/ |
work_keys_str_mv | AT affaqqamar highlevelsynthesisforsemiglobalmatchingisthejuiceworththesqueeze AT fahadbinmuslim highlevelsynthesisforsemiglobalmatchingisthejuiceworththesqueeze AT francescogregoretti highlevelsynthesisforsemiglobalmatchingisthejuiceworththesqueeze AT lucianolavagno highlevelsynthesisforsemiglobalmatchingisthejuiceworththesqueeze AT mihaiteodorlazarescu highlevelsynthesisforsemiglobalmatchingisthejuiceworththesqueeze |