Using HW/SW Codesign for Deep Neural Network Hardware Accelerator Targeting Low-Resources Embedded Processors

The usage of RISC-based embedded processors, aimed at low cost and low power, is becoming an increasingly popular ecosystem for both hardware and software development. High performance yet low power embedded processors may be attained via the use of hardware acceleration and Instruction Set Architec...

Full description

Bibliographic Details
Main Authors: Erez Manor, Shlomo Greenberg
Format: Article
Language:English
Published: IEEE 2022-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9718079/
_version_ 1818492383203950592
author Erez Manor
Shlomo Greenberg
author_facet Erez Manor
Shlomo Greenberg
author_sort Erez Manor
collection DOAJ
description The usage of RISC-based embedded processors, aimed at low cost and low power, is becoming an increasingly popular ecosystem for both hardware and software development. High performance yet low power embedded processors may be attained via the use of hardware acceleration and Instruction Set Architecture (ISA) extension. Efficient mapping of the computational load onto hardware and software resources is a key challenge for performance improvement while still keeping low power and area. Furthermore, exploring performance at an early stage of the design makes this challenge more difficult. Potential hardware accelerators can be identified and extracted from the high-level source code by graph analysis to enumerate common patterns. A scheduling algorithm is used to select an optimized sub-set of accelerators to meet real-time constraints. This paper proposes an efficient hardware/software codesign partitioning methodology applied to high-level programming language at an early stage of the design. The proposed methodology is based on graph analysis. The applied algorithms are presented by a synchronous directed acyclic graph. A constraint-driven method and unique scheduling algorithm are used for graph partitioning to obtain overall speedup and area requirements. The proposed hardware/software partitioning methodology has been evaluated for MLPerf Tiny benchmark. Experimental results demonstrate a speedup of up to 3 orders of magnitude compared to software-only implementation. For example, the resulting runtime for the KWS (Keyword Spotting) software implementation is reduced from 206 sec to only 181ms using the proposed hardware-acceleration approach.
first_indexed 2024-12-10T17:42:19Z
format Article
id doaj.art-779f097df6c64c36bfae167a8b92f9c3
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-10T17:42:19Z
publishDate 2022-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-779f097df6c64c36bfae167a8b92f9c32022-12-22T01:39:19ZengIEEEIEEE Access2169-35362022-01-0110222742228710.1109/ACCESS.2022.31531199718079Using HW/SW Codesign for Deep Neural Network Hardware Accelerator Targeting Low-Resources Embedded ProcessorsErez Manor0https://orcid.org/0000-0002-2708-5628Shlomo Greenberg1https://orcid.org/0000-0002-1385-8394Department of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Be’er Sheva, IsraelDepartment of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Be’er Sheva, IsraelThe usage of RISC-based embedded processors, aimed at low cost and low power, is becoming an increasingly popular ecosystem for both hardware and software development. High performance yet low power embedded processors may be attained via the use of hardware acceleration and Instruction Set Architecture (ISA) extension. Efficient mapping of the computational load onto hardware and software resources is a key challenge for performance improvement while still keeping low power and area. Furthermore, exploring performance at an early stage of the design makes this challenge more difficult. Potential hardware accelerators can be identified and extracted from the high-level source code by graph analysis to enumerate common patterns. A scheduling algorithm is used to select an optimized sub-set of accelerators to meet real-time constraints. This paper proposes an efficient hardware/software codesign partitioning methodology applied to high-level programming language at an early stage of the design. The proposed methodology is based on graph analysis. The applied algorithms are presented by a synchronous directed acyclic graph. A constraint-driven method and unique scheduling algorithm are used for graph partitioning to obtain overall speedup and area requirements. The proposed hardware/software partitioning methodology has been evaluated for MLPerf Tiny benchmark. Experimental results demonstrate a speedup of up to 3 orders of magnitude compared to software-only implementation. For example, the resulting runtime for the KWS (Keyword Spotting) software implementation is reduced from 206 sec to only 181ms using the proposed hardware-acceleration approach.https://ieeexplore.ieee.org/document/9718079/HW/SW codesignSDF Graphextensible processorsMLPerf tiny
spellingShingle Erez Manor
Shlomo Greenberg
Using HW/SW Codesign for Deep Neural Network Hardware Accelerator Targeting Low-Resources Embedded Processors
IEEE Access
HW/SW codesign
SDF Graph
extensible processors
MLPerf tiny
title Using HW/SW Codesign for Deep Neural Network Hardware Accelerator Targeting Low-Resources Embedded Processors
title_full Using HW/SW Codesign for Deep Neural Network Hardware Accelerator Targeting Low-Resources Embedded Processors
title_fullStr Using HW/SW Codesign for Deep Neural Network Hardware Accelerator Targeting Low-Resources Embedded Processors
title_full_unstemmed Using HW/SW Codesign for Deep Neural Network Hardware Accelerator Targeting Low-Resources Embedded Processors
title_short Using HW/SW Codesign for Deep Neural Network Hardware Accelerator Targeting Low-Resources Embedded Processors
title_sort using hw sw codesign for deep neural network hardware accelerator targeting low resources embedded processors
topic HW/SW codesign
SDF Graph
extensible processors
MLPerf tiny
url https://ieeexplore.ieee.org/document/9718079/
work_keys_str_mv AT erezmanor usinghwswcodesignfordeepneuralnetworkhardwareacceleratortargetinglowresourcesembeddedprocessors
AT shlomogreenberg usinghwswcodesignfordeepneuralnetworkhardwareacceleratortargetinglowresourcesembeddedprocessors