Using HW/SW Codesign for Deep Neural Network Hardware Accelerator Targeting Low-Resources Embedded Processors
The usage of RISC-based embedded processors, aimed at low cost and low power, is becoming an increasingly popular ecosystem for both hardware and software development. High performance yet low power embedded processors may be attained via the use of hardware acceleration and Instruction Set Architec...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2022-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9718079/ |
_version_ | 1818492383203950592 |
---|---|
author | Erez Manor Shlomo Greenberg |
author_facet | Erez Manor Shlomo Greenberg |
author_sort | Erez Manor |
collection | DOAJ |
description | The usage of RISC-based embedded processors, aimed at low cost and low power, is becoming an increasingly popular ecosystem for both hardware and software development. High performance yet low power embedded processors may be attained via the use of hardware acceleration and Instruction Set Architecture (ISA) extension. Efficient mapping of the computational load onto hardware and software resources is a key challenge for performance improvement while still keeping low power and area. Furthermore, exploring performance at an early stage of the design makes this challenge more difficult. Potential hardware accelerators can be identified and extracted from the high-level source code by graph analysis to enumerate common patterns. A scheduling algorithm is used to select an optimized sub-set of accelerators to meet real-time constraints. This paper proposes an efficient hardware/software codesign partitioning methodology applied to high-level programming language at an early stage of the design. The proposed methodology is based on graph analysis. The applied algorithms are presented by a synchronous directed acyclic graph. A constraint-driven method and unique scheduling algorithm are used for graph partitioning to obtain overall speedup and area requirements. The proposed hardware/software partitioning methodology has been evaluated for MLPerf Tiny benchmark. Experimental results demonstrate a speedup of up to 3 orders of magnitude compared to software-only implementation. For example, the resulting runtime for the KWS (Keyword Spotting) software implementation is reduced from 206 sec to only 181ms using the proposed hardware-acceleration approach. |
first_indexed | 2024-12-10T17:42:19Z |
format | Article |
id | doaj.art-779f097df6c64c36bfae167a8b92f9c3 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-10T17:42:19Z |
publishDate | 2022-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-779f097df6c64c36bfae167a8b92f9c32022-12-22T01:39:19ZengIEEEIEEE Access2169-35362022-01-0110222742228710.1109/ACCESS.2022.31531199718079Using HW/SW Codesign for Deep Neural Network Hardware Accelerator Targeting Low-Resources Embedded ProcessorsErez Manor0https://orcid.org/0000-0002-2708-5628Shlomo Greenberg1https://orcid.org/0000-0002-1385-8394Department of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Be’er Sheva, IsraelDepartment of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Be’er Sheva, IsraelThe usage of RISC-based embedded processors, aimed at low cost and low power, is becoming an increasingly popular ecosystem for both hardware and software development. High performance yet low power embedded processors may be attained via the use of hardware acceleration and Instruction Set Architecture (ISA) extension. Efficient mapping of the computational load onto hardware and software resources is a key challenge for performance improvement while still keeping low power and area. Furthermore, exploring performance at an early stage of the design makes this challenge more difficult. Potential hardware accelerators can be identified and extracted from the high-level source code by graph analysis to enumerate common patterns. A scheduling algorithm is used to select an optimized sub-set of accelerators to meet real-time constraints. This paper proposes an efficient hardware/software codesign partitioning methodology applied to high-level programming language at an early stage of the design. The proposed methodology is based on graph analysis. The applied algorithms are presented by a synchronous directed acyclic graph. A constraint-driven method and unique scheduling algorithm are used for graph partitioning to obtain overall speedup and area requirements. The proposed hardware/software partitioning methodology has been evaluated for MLPerf Tiny benchmark. Experimental results demonstrate a speedup of up to 3 orders of magnitude compared to software-only implementation. For example, the resulting runtime for the KWS (Keyword Spotting) software implementation is reduced from 206 sec to only 181ms using the proposed hardware-acceleration approach.https://ieeexplore.ieee.org/document/9718079/HW/SW codesignSDF Graphextensible processorsMLPerf tiny |
spellingShingle | Erez Manor Shlomo Greenberg Using HW/SW Codesign for Deep Neural Network Hardware Accelerator Targeting Low-Resources Embedded Processors IEEE Access HW/SW codesign SDF Graph extensible processors MLPerf tiny |
title | Using HW/SW Codesign for Deep Neural Network Hardware Accelerator Targeting Low-Resources Embedded Processors |
title_full | Using HW/SW Codesign for Deep Neural Network Hardware Accelerator Targeting Low-Resources Embedded Processors |
title_fullStr | Using HW/SW Codesign for Deep Neural Network Hardware Accelerator Targeting Low-Resources Embedded Processors |
title_full_unstemmed | Using HW/SW Codesign for Deep Neural Network Hardware Accelerator Targeting Low-Resources Embedded Processors |
title_short | Using HW/SW Codesign for Deep Neural Network Hardware Accelerator Targeting Low-Resources Embedded Processors |
title_sort | using hw sw codesign for deep neural network hardware accelerator targeting low resources embedded processors |
topic | HW/SW codesign SDF Graph extensible processors MLPerf tiny |
url | https://ieeexplore.ieee.org/document/9718079/ |
work_keys_str_mv | AT erezmanor usinghwswcodesignfordeepneuralnetworkhardwareacceleratortargetinglowresourcesembeddedprocessors AT shlomogreenberg usinghwswcodesignfordeepneuralnetworkhardwareacceleratortargetinglowresourcesembeddedprocessors |