DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and Training
On-chip deep neural network (DNN) inference and training at the Extreme-Edge (TinyML) impose strict latency, throughput, accuracy, and flexibility requirements. Heterogeneous clusters are promising solutions to meet the challenge, combining the flexibility of DSP-enhanced cores with the performance...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2022-01-01
|
Series: | IEEE Open Journal of the Solid-State Circuits Society |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9903915/ |
_version_ | 1827276318641750016 |
---|---|
author | Angelo Garofalo Yvan Tortorella Matteo Perotti Luca Valente Alessandro Nadalini Luca Benini Davide Rossi Francesco Conti |
author_facet | Angelo Garofalo Yvan Tortorella Matteo Perotti Luca Valente Alessandro Nadalini Luca Benini Davide Rossi Francesco Conti |
author_sort | Angelo Garofalo |
collection | DOAJ |
description | On-chip deep neural network (DNN) inference and training at the Extreme-Edge (TinyML) impose strict latency, throughput, accuracy, and flexibility requirements. Heterogeneous clusters are promising solutions to meet the challenge, combining the flexibility of DSP-enhanced cores with the performance and energy boost of dedicated accelerators. We present DARKSIDE, a System-on-Chip with a heterogeneous cluster of eight RISC-V cores enhanced with 2-b to 32-b mixed-precision integer arithmetic. To boost the performance and efficiency on key compute-intensive DNN kernels, the cluster is enriched with three digital accelerators: 1) a specialized engine for low-data-reuse depthwise convolution kernels (up to 30 MAC/cycle); 2) a minimal overhead datamover to marshal 1–32-b data on-the-fly; and 3) a 16-b floating-point tensor product engine (TPE) for tiled matrix-multiplication acceleration. DARKSIDE is implemented in 65-nm CMOS technology. The cluster achieves a peak integer performance of 65 GOPS and a peak efficiency of 835 GOPS/W when working on 2-b integer DNN kernels. When targeting floating-point tensor operations, the TPE provides up to 18.2 GFLOPS of performance or 300 GFLOPS/W of efficiency—enough to enable on-chip floating-point training at competitive speed coupled with ultralow power quantized inference. |
first_indexed | 2024-04-24T06:43:17Z |
format | Article |
id | doaj.art-6e8b4a55628e4539a6900c8db66fbba3 |
institution | Directory Open Access Journal |
issn | 2644-1349 |
language | English |
last_indexed | 2024-04-24T06:43:17Z |
publishDate | 2022-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Open Journal of the Solid-State Circuits Society |
spelling | doaj.art-6e8b4a55628e4539a6900c8db66fbba32024-04-22T20:40:13ZengIEEEIEEE Open Journal of the Solid-State Circuits Society2644-13492022-01-01223124310.1109/OJSSCS.2022.32100829903915DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and TrainingAngelo Garofalo0https://orcid.org/0000-0002-7495-6895Yvan Tortorella1https://orcid.org/0000-0001-8248-5731Matteo Perotti2Luca Valente3https://orcid.org/0000-0002-7458-477XAlessandro Nadalini4Luca Benini5Davide Rossi6https://orcid.org/0000-0002-0651-5393Francesco Conti7https://orcid.org/0000-0002-7924-933XDepartment of Electrical, Electronic and Information Engineering, University of Bologna, Bologna, ItalyDepartment of Electrical, Electronic and Information Engineering, University of Bologna, Bologna, ItalyIIS Integrated Systems Laboratory, ETH Zürich, Zürich, SwitzerlandDepartment of Electrical, Electronic and Information Engineering, University of Bologna, Bologna, ItalyDepartment of Electrical, Electronic and Information Engineering, University of Bologna, Bologna, ItalyDepartment of Electrical, Electronic and Information Engineering, University of Bologna, Bologna, ItalyDepartment of Electrical, Electronic and Information Engineering, University of Bologna, Bologna, ItalyDepartment of Electrical, Electronic and Information Engineering, University of Bologna, Bologna, ItalyOn-chip deep neural network (DNN) inference and training at the Extreme-Edge (TinyML) impose strict latency, throughput, accuracy, and flexibility requirements. Heterogeneous clusters are promising solutions to meet the challenge, combining the flexibility of DSP-enhanced cores with the performance and energy boost of dedicated accelerators. We present DARKSIDE, a System-on-Chip with a heterogeneous cluster of eight RISC-V cores enhanced with 2-b to 32-b mixed-precision integer arithmetic. To boost the performance and efficiency on key compute-intensive DNN kernels, the cluster is enriched with three digital accelerators: 1) a specialized engine for low-data-reuse depthwise convolution kernels (up to 30 MAC/cycle); 2) a minimal overhead datamover to marshal 1–32-b data on-the-fly; and 3) a 16-b floating-point tensor product engine (TPE) for tiled matrix-multiplication acceleration. DARKSIDE is implemented in 65-nm CMOS technology. The cluster achieves a peak integer performance of 65 GOPS and a peak efficiency of 835 GOPS/W when working on 2-b integer DNN kernels. When targeting floating-point tensor operations, the TPE provides up to 18.2 GFLOPS of performance or 300 GFLOPS/W of efficiency—enough to enable on-chip floating-point training at competitive speed coupled with ultralow power quantized inference.https://ieeexplore.ieee.org/document/9903915/Heterogeneous clustertensor product engine (TPE)ultralow-power AI |
spellingShingle | Angelo Garofalo Yvan Tortorella Matteo Perotti Luca Valente Alessandro Nadalini Luca Benini Davide Rossi Francesco Conti DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and Training IEEE Open Journal of the Solid-State Circuits Society Heterogeneous cluster tensor product engine (TPE) ultralow-power AI |
title | DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and Training |
title_full | DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and Training |
title_fullStr | DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and Training |
title_full_unstemmed | DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and Training |
title_short | DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and Training |
title_sort | darkside a heterogeneous risc v compute cluster for extreme edge on chip dnn inference and training |
topic | Heterogeneous cluster tensor product engine (TPE) ultralow-power AI |
url | https://ieeexplore.ieee.org/document/9903915/ |
work_keys_str_mv | AT angelogarofalo darksideaheterogeneousriscvcomputeclusterforextremeedgeonchipdnninferenceandtraining AT yvantortorella darksideaheterogeneousriscvcomputeclusterforextremeedgeonchipdnninferenceandtraining AT matteoperotti darksideaheterogeneousriscvcomputeclusterforextremeedgeonchipdnninferenceandtraining AT lucavalente darksideaheterogeneousriscvcomputeclusterforextremeedgeonchipdnninferenceandtraining AT alessandronadalini darksideaheterogeneousriscvcomputeclusterforextremeedgeonchipdnninferenceandtraining AT lucabenini darksideaheterogeneousriscvcomputeclusterforextremeedgeonchipdnninferenceandtraining AT daviderossi darksideaheterogeneousriscvcomputeclusterforextremeedgeonchipdnninferenceandtraining AT francescoconti darksideaheterogeneousriscvcomputeclusterforextremeedgeonchipdnninferenceandtraining |