Using HW/SW Codesign for Deep Neural Network Hardware Accelerator Targeting Low-Resources Embedded Processors

The usage of RISC-based embedded processors, aimed at low cost and low power, is becoming an increasingly popular ecosystem for both hardware and software development. High performance yet low power embedded processors may be attained via the use of hardware acceleration and Instruction Set Architec...

Full description

Bibliographic Details
Main Authors:	Erez Manor, Shlomo Greenberg
Format:	Article
Language:	English
Published:	IEEE 2022-01-01
Series:	IEEE Access
Subjects:	HW/SW codesign SDF Graph extensible processors MLPerf tiny
Online Access:	https://ieeexplore.ieee.org/document/9718079/

_version_	1818492383203950592
author	Erez Manor Shlomo Greenberg
author_facet	Erez Manor Shlomo Greenberg
author_sort	Erez Manor
collection	DOAJ
description	The usage of RISC-based embedded processors, aimed at low cost and low power, is becoming an increasingly popular ecosystem for both hardware and software development. High performance yet low power embedded processors may be attained via the use of hardware acceleration and Instruction Set Architecture (ISA) extension. Efficient mapping of the computational load onto hardware and software resources is a key challenge for performance improvement while still keeping low power and area. Furthermore, exploring performance at an early stage of the design makes this challenge more difficult. Potential hardware accelerators can be identified and extracted from the high-level source code by graph analysis to enumerate common patterns. A scheduling algorithm is used to select an optimized sub-set of accelerators to meet real-time constraints. This paper proposes an efficient hardware/software codesign partitioning methodology applied to high-level programming language at an early stage of the design. The proposed methodology is based on graph analysis. The applied algorithms are presented by a synchronous directed acyclic graph. A constraint-driven method and unique scheduling algorithm are used for graph partitioning to obtain overall speedup and area requirements. The proposed hardware/software partitioning methodology has been evaluated for MLPerf Tiny benchmark. Experimental results demonstrate a speedup of up to 3 orders of magnitude compared to software-only implementation. For example, the resulting runtime for the KWS (Keyword Spotting) software implementation is reduced from 206 sec to only 181ms using the proposed hardware-acceleration approach.
first_indexed	2024-12-10T17:42:19Z
format	Article
id	doaj.art-779f097df6c64c36bfae167a8b92f9c3
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-12-10T17:42:19Z
publishDate	2022-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-779f097df6c64c36bfae167a8b92f9c32022-12-22T01:39:19ZengIEEEIEEE Access2169-35362022-01-0110222742228710.1109/ACCESS.2022.31531199718079Using HW/SW Codesign for Deep Neural Network Hardware Accelerator Targeting Low-Resources Embedded ProcessorsErez Manor0https://orcid.org/0000-0002-2708-5628Shlomo Greenberg1https://orcid.org/0000-0002-1385-8394Department of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Be’er Sheva, IsraelDepartment of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Be’er Sheva, IsraelThe usage of RISC-based embedded processors, aimed at low cost and low power, is becoming an increasingly popular ecosystem for both hardware and software development. High performance yet low power embedded processors may be attained via the use of hardware acceleration and Instruction Set Architecture (ISA) extension. Efficient mapping of the computational load onto hardware and software resources is a key challenge for performance improvement while still keeping low power and area. Furthermore, exploring performance at an early stage of the design makes this challenge more difficult. Potential hardware accelerators can be identified and extracted from the high-level source code by graph analysis to enumerate common patterns. A scheduling algorithm is used to select an optimized sub-set of accelerators to meet real-time constraints. This paper proposes an efficient hardware/software codesign partitioning methodology applied to high-level programming language at an early stage of the design. The proposed methodology is based on graph analysis. The applied algorithms are presented by a synchronous directed acyclic graph. A constraint-driven method and unique scheduling algorithm are used for graph partitioning to obtain overall speedup and area requirements. The proposed hardware/software partitioning methodology has been evaluated for MLPerf Tiny benchmark. Experimental results demonstrate a speedup of up to 3 orders of magnitude compared to software-only implementation. For example, the resulting runtime for the KWS (Keyword Spotting) software implementation is reduced from 206 sec to only 181ms using the proposed hardware-acceleration approach.https://ieeexplore.ieee.org/document/9718079/HW/SW codesignSDF Graphextensible processorsMLPerf tiny
spellingShingle	Erez Manor Shlomo Greenberg Using HW/SW Codesign for Deep Neural Network Hardware Accelerator Targeting Low-Resources Embedded Processors IEEE Access HW/SW codesign SDF Graph extensible processors MLPerf tiny
title	Using HW/SW Codesign for Deep Neural Network Hardware Accelerator Targeting Low-Resources Embedded Processors
title_full	Using HW/SW Codesign for Deep Neural Network Hardware Accelerator Targeting Low-Resources Embedded Processors
title_fullStr	Using HW/SW Codesign for Deep Neural Network Hardware Accelerator Targeting Low-Resources Embedded Processors
title_full_unstemmed	Using HW/SW Codesign for Deep Neural Network Hardware Accelerator Targeting Low-Resources Embedded Processors
title_short	Using HW/SW Codesign for Deep Neural Network Hardware Accelerator Targeting Low-Resources Embedded Processors
title_sort	using hw sw codesign for deep neural network hardware accelerator targeting low resources embedded processors
topic	HW/SW codesign SDF Graph extensible processors MLPerf tiny
url	https://ieeexplore.ieee.org/document/9718079/
work_keys_str_mv	AT erezmanor usinghwswcodesignfordeepneuralnetworkhardwareacceleratortargetinglowresourcesembeddedprocessors AT shlomogreenberg usinghwswcodesignfordeepneuralnetworkhardwareacceleratortargetinglowresourcesembeddedprocessors

Using HW/SW Codesign for Deep Neural Network Hardware Accelerator Targeting Low-Resources Embedded Processors

Similar Items