A High Performance Reconfigurable Hardware Architecture for Lightweight Convolutional Neural Network

Since the lightweight convolutional neural network EfficientNet was proposed by Google in 2019, the series of models have quickly become very popular due to their superior performance with a small number of parameters. However, the existing convolutional neural network hardware accelerators for Effi...

Full description

Bibliographic Details
Main Authors: Fubang An, Lingli Wang, Xuegong Zhou
Format: Article
Language:English
Published: MDPI AG 2023-06-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/12/13/2847
_version_ 1797591867324366848
author Fubang An
Lingli Wang
Xuegong Zhou
author_facet Fubang An
Lingli Wang
Xuegong Zhou
author_sort Fubang An
collection DOAJ
description Since the lightweight convolutional neural network EfficientNet was proposed by Google in 2019, the series of models have quickly become very popular due to their superior performance with a small number of parameters. However, the existing convolutional neural network hardware accelerators for EfficientNet still have much room to improve the performance of the depthwise convolution, squeeze-and-excitation module and nonlinear activation functions. In this paper, we first design a reconfigurable register array and computational kernel to accelerate the depthwise convolution. Next, we propose a vector unit to implement the nonlinear activation functions and the scale operation. An exchangeable-sequence dual-computational kernel architecture is proposed to improve the performance and the utilization. In addition, the memory architectures are designed to complete the hardware accelerator for the above computing architecture. Finally, in order to evaluate the performance of the hardware accelerator, the accelerator is implemented based on Xilinx XCVU37P. The results show that the proposed accelerator can work at the main system clock frequency of 300 MHz with the DSP kernel at 600 MHz. The performance of EfficientNet-B3 in our architecture can reach 69.50 FPS and 255.22 GOPS. Compared with the latest EfficientNet-B3 accelerator, which uses the same FPGA development board, the accelerator proposed in this paper can achieve a 1.28-fold improvement of single-core performance and 1.38-fold improvement of performance of each DSP.
first_indexed 2024-03-11T01:44:40Z
format Article
id doaj.art-e65b0f38922743b7a83e47387a66446a
institution Directory Open Access Journal
issn 2079-9292
language English
last_indexed 2024-03-11T01:44:40Z
publishDate 2023-06-01
publisher MDPI AG
record_format Article
series Electronics
spelling doaj.art-e65b0f38922743b7a83e47387a66446a2023-11-18T16:24:16ZengMDPI AGElectronics2079-92922023-06-011213284710.3390/electronics12132847A High Performance Reconfigurable Hardware Architecture for Lightweight Convolutional Neural NetworkFubang An0Lingli Wang1Xuegong Zhou2School of Microelectronics, Fudan University, Shanghai 200433, ChinaSchool of Microelectronics, Fudan University, Shanghai 200433, ChinaInstitute of Big Data, Fudan University, Shanghai 200433, ChinaSince the lightweight convolutional neural network EfficientNet was proposed by Google in 2019, the series of models have quickly become very popular due to their superior performance with a small number of parameters. However, the existing convolutional neural network hardware accelerators for EfficientNet still have much room to improve the performance of the depthwise convolution, squeeze-and-excitation module and nonlinear activation functions. In this paper, we first design a reconfigurable register array and computational kernel to accelerate the depthwise convolution. Next, we propose a vector unit to implement the nonlinear activation functions and the scale operation. An exchangeable-sequence dual-computational kernel architecture is proposed to improve the performance and the utilization. In addition, the memory architectures are designed to complete the hardware accelerator for the above computing architecture. Finally, in order to evaluate the performance of the hardware accelerator, the accelerator is implemented based on Xilinx XCVU37P. The results show that the proposed accelerator can work at the main system clock frequency of 300 MHz with the DSP kernel at 600 MHz. The performance of EfficientNet-B3 in our architecture can reach 69.50 FPS and 255.22 GOPS. Compared with the latest EfficientNet-B3 accelerator, which uses the same FPGA development board, the accelerator proposed in this paper can achieve a 1.28-fold improvement of single-core performance and 1.38-fold improvement of performance of each DSP.https://www.mdpi.com/2079-9292/12/13/2847lightweight convolutional neural networkEfficientNetreconfigurable hardware architectureFPGA implementation
spellingShingle Fubang An
Lingli Wang
Xuegong Zhou
A High Performance Reconfigurable Hardware Architecture for Lightweight Convolutional Neural Network
Electronics
lightweight convolutional neural network
EfficientNet
reconfigurable hardware architecture
FPGA implementation
title A High Performance Reconfigurable Hardware Architecture for Lightweight Convolutional Neural Network
title_full A High Performance Reconfigurable Hardware Architecture for Lightweight Convolutional Neural Network
title_fullStr A High Performance Reconfigurable Hardware Architecture for Lightweight Convolutional Neural Network
title_full_unstemmed A High Performance Reconfigurable Hardware Architecture for Lightweight Convolutional Neural Network
title_short A High Performance Reconfigurable Hardware Architecture for Lightweight Convolutional Neural Network
title_sort high performance reconfigurable hardware architecture for lightweight convolutional neural network
topic lightweight convolutional neural network
EfficientNet
reconfigurable hardware architecture
FPGA implementation
url https://www.mdpi.com/2079-9292/12/13/2847
work_keys_str_mv AT fubangan ahighperformancereconfigurablehardwarearchitectureforlightweightconvolutionalneuralnetwork
AT lingliwang ahighperformancereconfigurablehardwarearchitectureforlightweightconvolutionalneuralnetwork
AT xuegongzhou ahighperformancereconfigurablehardwarearchitectureforlightweightconvolutionalneuralnetwork
AT fubangan highperformancereconfigurablehardwarearchitectureforlightweightconvolutionalneuralnetwork
AT lingliwang highperformancereconfigurablehardwarearchitectureforlightweightconvolutionalneuralnetwork
AT xuegongzhou highperformancereconfigurablehardwarearchitectureforlightweightconvolutionalneuralnetwork