DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and Training

On-chip deep neural network (DNN) inference and training at the Extreme-Edge (TinyML) impose strict latency, throughput, accuracy, and flexibility requirements. Heterogeneous clusters are promising solutions to meet the challenge, combining the flexibility of DSP-enhanced cores with the performance...

Full description

Bibliographic Details
Main Authors: Angelo Garofalo, Yvan Tortorella, Matteo Perotti, Luca Valente, Alessandro Nadalini, Luca Benini, Davide Rossi, Francesco Conti
Format: Article
Language:English
Published: IEEE 2022-01-01
Series:IEEE Open Journal of the Solid-State Circuits Society
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9903915/
_version_ 1827276318641750016
author Angelo Garofalo
Yvan Tortorella
Matteo Perotti
Luca Valente
Alessandro Nadalini
Luca Benini
Davide Rossi
Francesco Conti
author_facet Angelo Garofalo
Yvan Tortorella
Matteo Perotti
Luca Valente
Alessandro Nadalini
Luca Benini
Davide Rossi
Francesco Conti
author_sort Angelo Garofalo
collection DOAJ
description On-chip deep neural network (DNN) inference and training at the Extreme-Edge (TinyML) impose strict latency, throughput, accuracy, and flexibility requirements. Heterogeneous clusters are promising solutions to meet the challenge, combining the flexibility of DSP-enhanced cores with the performance and energy boost of dedicated accelerators. We present DARKSIDE, a System-on-Chip with a heterogeneous cluster of eight RISC-V cores enhanced with 2-b to 32-b mixed-precision integer arithmetic. To boost the performance and efficiency on key compute-intensive DNN kernels, the cluster is enriched with three digital accelerators: 1) a specialized engine for low-data-reuse depthwise convolution kernels (up to 30 MAC/cycle); 2) a minimal overhead datamover to marshal 1–32-b data on-the-fly; and 3) a 16-b floating-point tensor product engine (TPE) for tiled matrix-multiplication acceleration. DARKSIDE is implemented in 65-nm CMOS technology. The cluster achieves a peak integer performance of 65 GOPS and a peak efficiency of 835 GOPS/W when working on 2-b integer DNN kernels. When targeting floating-point tensor operations, the TPE provides up to 18.2 GFLOPS of performance or 300 GFLOPS/W of efficiency—enough to enable on-chip floating-point training at competitive speed coupled with ultralow power quantized inference.
first_indexed 2024-04-24T06:43:17Z
format Article
id doaj.art-6e8b4a55628e4539a6900c8db66fbba3
institution Directory Open Access Journal
issn 2644-1349
language English
last_indexed 2024-04-24T06:43:17Z
publishDate 2022-01-01
publisher IEEE
record_format Article
series IEEE Open Journal of the Solid-State Circuits Society
spelling doaj.art-6e8b4a55628e4539a6900c8db66fbba32024-04-22T20:40:13ZengIEEEIEEE Open Journal of the Solid-State Circuits Society2644-13492022-01-01223124310.1109/OJSSCS.2022.32100829903915DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and TrainingAngelo Garofalo0https://orcid.org/0000-0002-7495-6895Yvan Tortorella1https://orcid.org/0000-0001-8248-5731Matteo Perotti2Luca Valente3https://orcid.org/0000-0002-7458-477XAlessandro Nadalini4Luca Benini5Davide Rossi6https://orcid.org/0000-0002-0651-5393Francesco Conti7https://orcid.org/0000-0002-7924-933XDepartment of Electrical, Electronic and Information Engineering, University of Bologna, Bologna, ItalyDepartment of Electrical, Electronic and Information Engineering, University of Bologna, Bologna, ItalyIIS Integrated Systems Laboratory, ETH Zürich, Zürich, SwitzerlandDepartment of Electrical, Electronic and Information Engineering, University of Bologna, Bologna, ItalyDepartment of Electrical, Electronic and Information Engineering, University of Bologna, Bologna, ItalyDepartment of Electrical, Electronic and Information Engineering, University of Bologna, Bologna, ItalyDepartment of Electrical, Electronic and Information Engineering, University of Bologna, Bologna, ItalyDepartment of Electrical, Electronic and Information Engineering, University of Bologna, Bologna, ItalyOn-chip deep neural network (DNN) inference and training at the Extreme-Edge (TinyML) impose strict latency, throughput, accuracy, and flexibility requirements. Heterogeneous clusters are promising solutions to meet the challenge, combining the flexibility of DSP-enhanced cores with the performance and energy boost of dedicated accelerators. We present DARKSIDE, a System-on-Chip with a heterogeneous cluster of eight RISC-V cores enhanced with 2-b to 32-b mixed-precision integer arithmetic. To boost the performance and efficiency on key compute-intensive DNN kernels, the cluster is enriched with three digital accelerators: 1) a specialized engine for low-data-reuse depthwise convolution kernels (up to 30 MAC/cycle); 2) a minimal overhead datamover to marshal 1–32-b data on-the-fly; and 3) a 16-b floating-point tensor product engine (TPE) for tiled matrix-multiplication acceleration. DARKSIDE is implemented in 65-nm CMOS technology. The cluster achieves a peak integer performance of 65 GOPS and a peak efficiency of 835 GOPS/W when working on 2-b integer DNN kernels. When targeting floating-point tensor operations, the TPE provides up to 18.2 GFLOPS of performance or 300 GFLOPS/W of efficiency—enough to enable on-chip floating-point training at competitive speed coupled with ultralow power quantized inference.https://ieeexplore.ieee.org/document/9903915/Heterogeneous clustertensor product engine (TPE)ultralow-power AI
spellingShingle Angelo Garofalo
Yvan Tortorella
Matteo Perotti
Luca Valente
Alessandro Nadalini
Luca Benini
Davide Rossi
Francesco Conti
DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and Training
IEEE Open Journal of the Solid-State Circuits Society
Heterogeneous cluster
tensor product engine (TPE)
ultralow-power AI
title DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and Training
title_full DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and Training
title_fullStr DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and Training
title_full_unstemmed DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and Training
title_short DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and Training
title_sort darkside a heterogeneous risc v compute cluster for extreme edge on chip dnn inference and training
topic Heterogeneous cluster
tensor product engine (TPE)
ultralow-power AI
url https://ieeexplore.ieee.org/document/9903915/
work_keys_str_mv AT angelogarofalo darksideaheterogeneousriscvcomputeclusterforextremeedgeonchipdnninferenceandtraining
AT yvantortorella darksideaheterogeneousriscvcomputeclusterforextremeedgeonchipdnninferenceandtraining
AT matteoperotti darksideaheterogeneousriscvcomputeclusterforextremeedgeonchipdnninferenceandtraining
AT lucavalente darksideaheterogeneousriscvcomputeclusterforextremeedgeonchipdnninferenceandtraining
AT alessandronadalini darksideaheterogeneousriscvcomputeclusterforextremeedgeonchipdnninferenceandtraining
AT lucabenini darksideaheterogeneousriscvcomputeclusterforextremeedgeonchipdnninferenceandtraining
AT daviderossi darksideaheterogeneousriscvcomputeclusterforextremeedgeonchipdnninferenceandtraining
AT francescoconti darksideaheterogeneousriscvcomputeclusterforextremeedgeonchipdnninferenceandtraining