Multi-Model Inference Accelerator for Binary Convolutional Neural Networks
Binary convolutional neural networks (BCNN) have shown good accuracy for small to medium neural network models. Their extreme quantization of weights and activations reduces off-chip data transfer and greatly reduces the computational complexity of convolutions. Further reduction in the complexity o...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-11-01
|
Series: | Electronics |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-9292/11/23/3966 |
_version_ | 1797463323782938624 |
---|---|
author | André L. de Sousa Mário P. Véstias Horácio C. Neto |
author_facet | André L. de Sousa Mário P. Véstias Horácio C. Neto |
author_sort | André L. de Sousa |
collection | DOAJ |
description | Binary convolutional neural networks (BCNN) have shown good accuracy for small to medium neural network models. Their extreme quantization of weights and activations reduces off-chip data transfer and greatly reduces the computational complexity of convolutions. Further reduction in the complexity of a BCNN model for fast execution can be achieved with model size reduction at the cost of network accuracy. In this paper, a multi-model inference technique is proposed to reduce the execution time of the binarized inference process without accuracy reduction. The technique considers a cascade of neural network models with different computation/accuracy ratios. A parameterizable binarized neural network with different trade-offs between complexity and accuracy is used to obtain multiple network models. We also propose a hardware accelerator to run multi-model inference throughput in embedded systems. The multi-model inference accelerator is demonstrated on low-density Zynq-7010 and Zynq-7020 FPGA devices, classifying images from the CIFAR-10 dataset. The proposed accelerator improves the frame rate per number of LUTs by 7.2× those of previous solutions on a ZYNQ7020 FPGA with similar accuracy. This shows the effectiveness of the multi-model inference technique and the efficiency of the proposed hardware accelerator. |
first_indexed | 2024-03-09T17:50:04Z |
format | Article |
id | doaj.art-16c1a73a96524956be6f025165878da9 |
institution | Directory Open Access Journal |
issn | 2079-9292 |
language | English |
last_indexed | 2024-03-09T17:50:04Z |
publishDate | 2022-11-01 |
publisher | MDPI AG |
record_format | Article |
series | Electronics |
spelling | doaj.art-16c1a73a96524956be6f025165878da92023-11-24T10:48:28ZengMDPI AGElectronics2079-92922022-11-011123396610.3390/electronics11233966Multi-Model Inference Accelerator for Binary Convolutional Neural NetworksAndré L. de Sousa0Mário P. Véstias1Horácio C. Neto2INESC-ID, Instituto Superior de Engenharia de Lisboa, Instituto Politécnico de Lisboa, 1959-007 Lisbon, PortugalINESC-ID, Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisbon, PortugalINESC-ID, Instituto Superior de Engenharia de Lisboa, Instituto Politécnico de Lisboa, 1959-007 Lisbon, PortugalBinary convolutional neural networks (BCNN) have shown good accuracy for small to medium neural network models. Their extreme quantization of weights and activations reduces off-chip data transfer and greatly reduces the computational complexity of convolutions. Further reduction in the complexity of a BCNN model for fast execution can be achieved with model size reduction at the cost of network accuracy. In this paper, a multi-model inference technique is proposed to reduce the execution time of the binarized inference process without accuracy reduction. The technique considers a cascade of neural network models with different computation/accuracy ratios. A parameterizable binarized neural network with different trade-offs between complexity and accuracy is used to obtain multiple network models. We also propose a hardware accelerator to run multi-model inference throughput in embedded systems. The multi-model inference accelerator is demonstrated on low-density Zynq-7010 and Zynq-7020 FPGA devices, classifying images from the CIFAR-10 dataset. The proposed accelerator improves the frame rate per number of LUTs by 7.2× those of previous solutions on a ZYNQ7020 FPGA with similar accuracy. This shows the effectiveness of the multi-model inference technique and the efficiency of the proposed hardware accelerator.https://www.mdpi.com/2079-9292/11/23/3966deep learningbinary convolutional neural networkdual-model inferenceFPGA |
spellingShingle | André L. de Sousa Mário P. Véstias Horácio C. Neto Multi-Model Inference Accelerator for Binary Convolutional Neural Networks Electronics deep learning binary convolutional neural network dual-model inference FPGA |
title | Multi-Model Inference Accelerator for Binary Convolutional Neural Networks |
title_full | Multi-Model Inference Accelerator for Binary Convolutional Neural Networks |
title_fullStr | Multi-Model Inference Accelerator for Binary Convolutional Neural Networks |
title_full_unstemmed | Multi-Model Inference Accelerator for Binary Convolutional Neural Networks |
title_short | Multi-Model Inference Accelerator for Binary Convolutional Neural Networks |
title_sort | multi model inference accelerator for binary convolutional neural networks |
topic | deep learning binary convolutional neural network dual-model inference FPGA |
url | https://www.mdpi.com/2079-9292/11/23/3966 |
work_keys_str_mv | AT andreldesousa multimodelinferenceacceleratorforbinaryconvolutionalneuralnetworks AT mariopvestias multimodelinferenceacceleratorforbinaryconvolutionalneuralnetworks AT horaciocneto multimodelinferenceacceleratorforbinaryconvolutionalneuralnetworks |