Multi-Model Inference Accelerator for Binary Convolutional Neural Networks

Binary convolutional neural networks (BCNN) have shown good accuracy for small to medium neural network models. Their extreme quantization of weights and activations reduces off-chip data transfer and greatly reduces the computational complexity of convolutions. Further reduction in the complexity o...

Full description

Bibliographic Details
Main Authors: André L. de Sousa, Mário P. Véstias, Horácio C. Neto
Format: Article
Language:English
Published: MDPI AG 2022-11-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/11/23/3966
_version_ 1797463323782938624
author André L. de Sousa
Mário P. Véstias
Horácio C. Neto
author_facet André L. de Sousa
Mário P. Véstias
Horácio C. Neto
author_sort André L. de Sousa
collection DOAJ
description Binary convolutional neural networks (BCNN) have shown good accuracy for small to medium neural network models. Their extreme quantization of weights and activations reduces off-chip data transfer and greatly reduces the computational complexity of convolutions. Further reduction in the complexity of a BCNN model for fast execution can be achieved with model size reduction at the cost of network accuracy. In this paper, a multi-model inference technique is proposed to reduce the execution time of the binarized inference process without accuracy reduction. The technique considers a cascade of neural network models with different computation/accuracy ratios. A parameterizable binarized neural network with different trade-offs between complexity and accuracy is used to obtain multiple network models. We also propose a hardware accelerator to run multi-model inference throughput in embedded systems. The multi-model inference accelerator is demonstrated on low-density Zynq-7010 and Zynq-7020 FPGA devices, classifying images from the CIFAR-10 dataset. The proposed accelerator improves the frame rate per number of LUTs by 7.2× those of previous solutions on a ZYNQ7020 FPGA with similar accuracy. This shows the effectiveness of the multi-model inference technique and the efficiency of the proposed hardware accelerator.
first_indexed 2024-03-09T17:50:04Z
format Article
id doaj.art-16c1a73a96524956be6f025165878da9
institution Directory Open Access Journal
issn 2079-9292
language English
last_indexed 2024-03-09T17:50:04Z
publishDate 2022-11-01
publisher MDPI AG
record_format Article
series Electronics
spelling doaj.art-16c1a73a96524956be6f025165878da92023-11-24T10:48:28ZengMDPI AGElectronics2079-92922022-11-011123396610.3390/electronics11233966Multi-Model Inference Accelerator for Binary Convolutional Neural NetworksAndré L. de Sousa0Mário P. Véstias1Horácio C. Neto2INESC-ID, Instituto Superior de Engenharia de Lisboa, Instituto Politécnico de Lisboa, 1959-007 Lisbon, PortugalINESC-ID, Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisbon, PortugalINESC-ID, Instituto Superior de Engenharia de Lisboa, Instituto Politécnico de Lisboa, 1959-007 Lisbon, PortugalBinary convolutional neural networks (BCNN) have shown good accuracy for small to medium neural network models. Their extreme quantization of weights and activations reduces off-chip data transfer and greatly reduces the computational complexity of convolutions. Further reduction in the complexity of a BCNN model for fast execution can be achieved with model size reduction at the cost of network accuracy. In this paper, a multi-model inference technique is proposed to reduce the execution time of the binarized inference process without accuracy reduction. The technique considers a cascade of neural network models with different computation/accuracy ratios. A parameterizable binarized neural network with different trade-offs between complexity and accuracy is used to obtain multiple network models. We also propose a hardware accelerator to run multi-model inference throughput in embedded systems. The multi-model inference accelerator is demonstrated on low-density Zynq-7010 and Zynq-7020 FPGA devices, classifying images from the CIFAR-10 dataset. The proposed accelerator improves the frame rate per number of LUTs by 7.2× those of previous solutions on a ZYNQ7020 FPGA with similar accuracy. This shows the effectiveness of the multi-model inference technique and the efficiency of the proposed hardware accelerator.https://www.mdpi.com/2079-9292/11/23/3966deep learningbinary convolutional neural networkdual-model inferenceFPGA
spellingShingle André L. de Sousa
Mário P. Véstias
Horácio C. Neto
Multi-Model Inference Accelerator for Binary Convolutional Neural Networks
Electronics
deep learning
binary convolutional neural network
dual-model inference
FPGA
title Multi-Model Inference Accelerator for Binary Convolutional Neural Networks
title_full Multi-Model Inference Accelerator for Binary Convolutional Neural Networks
title_fullStr Multi-Model Inference Accelerator for Binary Convolutional Neural Networks
title_full_unstemmed Multi-Model Inference Accelerator for Binary Convolutional Neural Networks
title_short Multi-Model Inference Accelerator for Binary Convolutional Neural Networks
title_sort multi model inference accelerator for binary convolutional neural networks
topic deep learning
binary convolutional neural network
dual-model inference
FPGA
url https://www.mdpi.com/2079-9292/11/23/3966
work_keys_str_mv AT andreldesousa multimodelinferenceacceleratorforbinaryconvolutionalneuralnetworks
AT mariopvestias multimodelinferenceacceleratorforbinaryconvolutionalneuralnetworks
AT horaciocneto multimodelinferenceacceleratorforbinaryconvolutionalneuralnetworks