A High Performance Reconfigurable Hardware Architecture for Lightweight Convolutional Neural Network

Since the lightweight convolutional neural network EfficientNet was proposed by Google in 2019, the series of models have quickly become very popular due to their superior performance with a small number of parameters. However, the existing convolutional neural network hardware accelerators for Effi...

Full description

Bibliographic Details
Main Authors:	Fubang An, Lingli Wang, Xuegong Zhou
Format:	Article
Language:	English
Published:	MDPI AG 2023-06-01
Series:	Electronics
Subjects:	lightweight convolutional neural network EfficientNet reconfigurable hardware architecture FPGA implementation
Online Access:	https://www.mdpi.com/2079-9292/12/13/2847

_version_	1797591867324366848
author	Fubang An Lingli Wang Xuegong Zhou
author_facet	Fubang An Lingli Wang Xuegong Zhou
author_sort	Fubang An
collection	DOAJ
description	Since the lightweight convolutional neural network EfficientNet was proposed by Google in 2019, the series of models have quickly become very popular due to their superior performance with a small number of parameters. However, the existing convolutional neural network hardware accelerators for EfficientNet still have much room to improve the performance of the depthwise convolution, squeeze-and-excitation module and nonlinear activation functions. In this paper, we first design a reconfigurable register array and computational kernel to accelerate the depthwise convolution. Next, we propose a vector unit to implement the nonlinear activation functions and the scale operation. An exchangeable-sequence dual-computational kernel architecture is proposed to improve the performance and the utilization. In addition, the memory architectures are designed to complete the hardware accelerator for the above computing architecture. Finally, in order to evaluate the performance of the hardware accelerator, the accelerator is implemented based on Xilinx XCVU37P. The results show that the proposed accelerator can work at the main system clock frequency of 300 MHz with the DSP kernel at 600 MHz. The performance of EfficientNet-B3 in our architecture can reach 69.50 FPS and 255.22 GOPS. Compared with the latest EfficientNet-B3 accelerator, which uses the same FPGA development board, the accelerator proposed in this paper can achieve a 1.28-fold improvement of single-core performance and 1.38-fold improvement of performance of each DSP.
first_indexed	2024-03-11T01:44:40Z
format	Article
id	doaj.art-e65b0f38922743b7a83e47387a66446a
institution	Directory Open Access Journal
issn	2079-9292
language	English
last_indexed	2024-03-11T01:44:40Z
publishDate	2023-06-01
publisher	MDPI AG
record_format	Article
series	Electronics
spelling	doaj.art-e65b0f38922743b7a83e47387a66446a2023-11-18T16:24:16ZengMDPI AGElectronics2079-92922023-06-011213284710.3390/electronics12132847A High Performance Reconfigurable Hardware Architecture for Lightweight Convolutional Neural NetworkFubang An0Lingli Wang1Xuegong Zhou2School of Microelectronics, Fudan University, Shanghai 200433, ChinaSchool of Microelectronics, Fudan University, Shanghai 200433, ChinaInstitute of Big Data, Fudan University, Shanghai 200433, ChinaSince the lightweight convolutional neural network EfficientNet was proposed by Google in 2019, the series of models have quickly become very popular due to their superior performance with a small number of parameters. However, the existing convolutional neural network hardware accelerators for EfficientNet still have much room to improve the performance of the depthwise convolution, squeeze-and-excitation module and nonlinear activation functions. In this paper, we first design a reconfigurable register array and computational kernel to accelerate the depthwise convolution. Next, we propose a vector unit to implement the nonlinear activation functions and the scale operation. An exchangeable-sequence dual-computational kernel architecture is proposed to improve the performance and the utilization. In addition, the memory architectures are designed to complete the hardware accelerator for the above computing architecture. Finally, in order to evaluate the performance of the hardware accelerator, the accelerator is implemented based on Xilinx XCVU37P. The results show that the proposed accelerator can work at the main system clock frequency of 300 MHz with the DSP kernel at 600 MHz. The performance of EfficientNet-B3 in our architecture can reach 69.50 FPS and 255.22 GOPS. Compared with the latest EfficientNet-B3 accelerator, which uses the same FPGA development board, the accelerator proposed in this paper can achieve a 1.28-fold improvement of single-core performance and 1.38-fold improvement of performance of each DSP.https://www.mdpi.com/2079-9292/12/13/2847lightweight convolutional neural networkEfficientNetreconfigurable hardware architectureFPGA implementation
spellingShingle	Fubang An Lingli Wang Xuegong Zhou A High Performance Reconfigurable Hardware Architecture for Lightweight Convolutional Neural Network Electronics lightweight convolutional neural network EfficientNet reconfigurable hardware architecture FPGA implementation
title	A High Performance Reconfigurable Hardware Architecture for Lightweight Convolutional Neural Network
title_full	A High Performance Reconfigurable Hardware Architecture for Lightweight Convolutional Neural Network
title_fullStr	A High Performance Reconfigurable Hardware Architecture for Lightweight Convolutional Neural Network
title_full_unstemmed	A High Performance Reconfigurable Hardware Architecture for Lightweight Convolutional Neural Network
title_short	A High Performance Reconfigurable Hardware Architecture for Lightweight Convolutional Neural Network
title_sort	high performance reconfigurable hardware architecture for lightweight convolutional neural network
topic	lightweight convolutional neural network EfficientNet reconfigurable hardware architecture FPGA implementation
url	https://www.mdpi.com/2079-9292/12/13/2847
work_keys_str_mv	AT fubangan ahighperformancereconfigurablehardwarearchitectureforlightweightconvolutionalneuralnetwork AT lingliwang ahighperformancereconfigurablehardwarearchitectureforlightweightconvolutionalneuralnetwork AT xuegongzhou ahighperformancereconfigurablehardwarearchitectureforlightweightconvolutionalneuralnetwork AT fubangan highperformancereconfigurablehardwarearchitectureforlightweightconvolutionalneuralnetwork AT lingliwang highperformancereconfigurablehardwarearchitectureforlightweightconvolutionalneuralnetwork AT xuegongzhou highperformancereconfigurablehardwarearchitectureforlightweightconvolutionalneuralnetwork

A High Performance Reconfigurable Hardware Architecture for Lightweight Convolutional Neural Network

Similar Items