Multi-Model Inference Accelerator for Binary Convolutional Neural Networks

Binary convolutional neural networks (BCNN) have shown good accuracy for small to medium neural network models. Their extreme quantization of weights and activations reduces off-chip data transfer and greatly reduces the computational complexity of convolutions. Further reduction in the complexity o...

Full description

Bibliographic Details
Main Authors:	André L. de Sousa, Mário P. Véstias, Horácio C. Neto
Format:	Article
Language:	English
Published:	MDPI AG 2022-11-01
Series:	Electronics
Subjects:	deep learning binary convolutional neural network dual-model inference FPGA
Online Access:	https://www.mdpi.com/2079-9292/11/23/3966

_version_	1797463323782938624
author	André L. de Sousa Mário P. Véstias Horácio C. Neto
author_facet	André L. de Sousa Mário P. Véstias Horácio C. Neto
author_sort	André L. de Sousa
collection	DOAJ
description	Binary convolutional neural networks (BCNN) have shown good accuracy for small to medium neural network models. Their extreme quantization of weights and activations reduces off-chip data transfer and greatly reduces the computational complexity of convolutions. Further reduction in the complexity of a BCNN model for fast execution can be achieved with model size reduction at the cost of network accuracy. In this paper, a multi-model inference technique is proposed to reduce the execution time of the binarized inference process without accuracy reduction. The technique considers a cascade of neural network models with different computation/accuracy ratios. A parameterizable binarized neural network with different trade-offs between complexity and accuracy is used to obtain multiple network models. We also propose a hardware accelerator to run multi-model inference throughput in embedded systems. The multi-model inference accelerator is demonstrated on low-density Zynq-7010 and Zynq-7020 FPGA devices, classifying images from the CIFAR-10 dataset. The proposed accelerator improves the frame rate per number of LUTs by 7.2× those of previous solutions on a ZYNQ7020 FPGA with similar accuracy. This shows the effectiveness of the multi-model inference technique and the efficiency of the proposed hardware accelerator.
first_indexed	2024-03-09T17:50:04Z
format	Article
id	doaj.art-16c1a73a96524956be6f025165878da9
institution	Directory Open Access Journal
issn	2079-9292
language	English
last_indexed	2024-03-09T17:50:04Z
publishDate	2022-11-01
publisher	MDPI AG
record_format	Article
series	Electronics
spelling	doaj.art-16c1a73a96524956be6f025165878da92023-11-24T10:48:28ZengMDPI AGElectronics2079-92922022-11-011123396610.3390/electronics11233966Multi-Model Inference Accelerator for Binary Convolutional Neural NetworksAndré L. de Sousa0Mário P. Véstias1Horácio C. Neto2INESC-ID, Instituto Superior de Engenharia de Lisboa, Instituto Politécnico de Lisboa, 1959-007 Lisbon, PortugalINESC-ID, Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisbon, PortugalINESC-ID, Instituto Superior de Engenharia de Lisboa, Instituto Politécnico de Lisboa, 1959-007 Lisbon, PortugalBinary convolutional neural networks (BCNN) have shown good accuracy for small to medium neural network models. Their extreme quantization of weights and activations reduces off-chip data transfer and greatly reduces the computational complexity of convolutions. Further reduction in the complexity of a BCNN model for fast execution can be achieved with model size reduction at the cost of network accuracy. In this paper, a multi-model inference technique is proposed to reduce the execution time of the binarized inference process without accuracy reduction. The technique considers a cascade of neural network models with different computation/accuracy ratios. A parameterizable binarized neural network with different trade-offs between complexity and accuracy is used to obtain multiple network models. We also propose a hardware accelerator to run multi-model inference throughput in embedded systems. The multi-model inference accelerator is demonstrated on low-density Zynq-7010 and Zynq-7020 FPGA devices, classifying images from the CIFAR-10 dataset. The proposed accelerator improves the frame rate per number of LUTs by 7.2× those of previous solutions on a ZYNQ7020 FPGA with similar accuracy. This shows the effectiveness of the multi-model inference technique and the efficiency of the proposed hardware accelerator.https://www.mdpi.com/2079-9292/11/23/3966deep learningbinary convolutional neural networkdual-model inferenceFPGA
spellingShingle	André L. de Sousa Mário P. Véstias Horácio C. Neto Multi-Model Inference Accelerator for Binary Convolutional Neural Networks Electronics deep learning binary convolutional neural network dual-model inference FPGA
title	Multi-Model Inference Accelerator for Binary Convolutional Neural Networks
title_full	Multi-Model Inference Accelerator for Binary Convolutional Neural Networks
title_fullStr	Multi-Model Inference Accelerator for Binary Convolutional Neural Networks
title_full_unstemmed	Multi-Model Inference Accelerator for Binary Convolutional Neural Networks
title_short	Multi-Model Inference Accelerator for Binary Convolutional Neural Networks
title_sort	multi model inference accelerator for binary convolutional neural networks
topic	deep learning binary convolutional neural network dual-model inference FPGA
url	https://www.mdpi.com/2079-9292/11/23/3966
work_keys_str_mv	AT andreldesousa multimodelinferenceacceleratorforbinaryconvolutionalneuralnetworks AT mariopvestias multimodelinferenceacceleratorforbinaryconvolutionalneuralnetworks AT horaciocneto multimodelinferenceacceleratorforbinaryconvolutionalneuralnetworks

Multi-Model Inference Accelerator for Binary Convolutional Neural Networks

Similar Items