A High-Performance FPGA-Based Depthwise Separable Convolution Accelerator

Depthwise separable convolution (DSC) significantly reduces parameter and floating operations with an acceptable loss of accuracy and has been widely used in various lightweight convolutional neural network (CNN) models. In practical applications, however, DSC accelerators based on graphics processi...

Full description

Bibliographic Details
Main Authors:	Jiye Huang, Xin Liu, Tongdong Guo, Zhijin Zhao
Format:	Article
Language:	English
Published:	MDPI AG 2023-03-01
Series:	Electronics
Subjects:	convolutional neural network depthwise separable convolution field programmable gate array hardware accelerator MobileNetV2
Online Access:	https://www.mdpi.com/2079-9292/12/7/1571

_version_	1797608108357320704
author	Jiye Huang Xin Liu Tongdong Guo Zhijin Zhao
author_facet	Jiye Huang Xin Liu Tongdong Guo Zhijin Zhao
author_sort	Jiye Huang
collection	DOAJ
description	Depthwise separable convolution (DSC) significantly reduces parameter and floating operations with an acceptable loss of accuracy and has been widely used in various lightweight convolutional neural network (CNN) models. In practical applications, however, DSC accelerators based on graphics processing units (GPUs) cannot fully exploit the performance of DSC and are unsuitable for mobile application scenarios. Moreover, low resource utilization due to idle engines is a common problem in DSC accelerator design. In this paper, a high-performance DSC hardware accelerator based on field-programmable gate arrays (FPGAs) is proposed. A highly reusable and scalable multiplication and accumulation engine is proposed to improve the utilization of computational resources. An efficient convolution algorithm is proposed for depthwise convolution (DWC) and pointwise convolution (PWC), respectively, to reduce the on-chip memory occupancy. Meanwhile, the proposed convolution algorithms achieve partial fusion between PWC and DWC, and improve the off-chip memory access efficiency. To maximise bandwidth utilization and reduce latency when reading feature maps, an address mapping method for off-chip accesses is proposed. The performance of the proposed accelerator is demonstrated by implementing MobileNetV2 on an Intel Arria 10 GX660 FPGA by using Verilog HDL. The experimental results show that the proposed DSC accelerator achieves a performance of 205.1 FPS, 128.8 GFLOPS, and 0.24 GOPS/DSP for input images of size <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>224</mn><mo>×</mo><mn>224</mn><mo>×</mo><mn>3</mn></mrow></semantics></math></inline-formula>.
first_indexed	2024-03-11T05:40:04Z
format	Article
id	doaj.art-9f94790224c24541be1f5a770e6a71b8
institution	Directory Open Access Journal
issn	2079-9292
language	English
last_indexed	2024-03-11T05:40:04Z
publishDate	2023-03-01
publisher	MDPI AG
record_format	Article
series	Electronics
spelling	doaj.art-9f94790224c24541be1f5a770e6a71b82023-11-17T16:32:29ZengMDPI AGElectronics2079-92922023-03-01127157110.3390/electronics12071571A High-Performance FPGA-Based Depthwise Separable Convolution AcceleratorJiye Huang0Xin Liu1Tongdong Guo2Zhijin Zhao3The School of Electronics and Information, Hangzhou Dianzi University, Hangzhou 310018, ChinaThe School of Electronics and Information, Hangzhou Dianzi University, Hangzhou 310018, ChinaThe School of Electronics and Information, Hangzhou Dianzi University, Hangzhou 310018, ChinaThe School of Communication Engineering, Hangzhou Dianzi University, Hangzhou 310018, ChinaDepthwise separable convolution (DSC) significantly reduces parameter and floating operations with an acceptable loss of accuracy and has been widely used in various lightweight convolutional neural network (CNN) models. In practical applications, however, DSC accelerators based on graphics processing units (GPUs) cannot fully exploit the performance of DSC and are unsuitable for mobile application scenarios. Moreover, low resource utilization due to idle engines is a common problem in DSC accelerator design. In this paper, a high-performance DSC hardware accelerator based on field-programmable gate arrays (FPGAs) is proposed. A highly reusable and scalable multiplication and accumulation engine is proposed to improve the utilization of computational resources. An efficient convolution algorithm is proposed for depthwise convolution (DWC) and pointwise convolution (PWC), respectively, to reduce the on-chip memory occupancy. Meanwhile, the proposed convolution algorithms achieve partial fusion between PWC and DWC, and improve the off-chip memory access efficiency. To maximise bandwidth utilization and reduce latency when reading feature maps, an address mapping method for off-chip accesses is proposed. The performance of the proposed accelerator is demonstrated by implementing MobileNetV2 on an Intel Arria 10 GX660 FPGA by using Verilog HDL. The experimental results show that the proposed DSC accelerator achieves a performance of 205.1 FPS, 128.8 GFLOPS, and 0.24 GOPS/DSP for input images of size <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>224</mn><mo>×</mo><mn>224</mn><mo>×</mo><mn>3</mn></mrow></semantics></math></inline-formula>.https://www.mdpi.com/2079-9292/12/7/1571convolutional neural networkdepthwise separable convolutionfield programmable gate arrayhardware acceleratorMobileNetV2
spellingShingle	Jiye Huang Xin Liu Tongdong Guo Zhijin Zhao A High-Performance FPGA-Based Depthwise Separable Convolution Accelerator Electronics convolutional neural network depthwise separable convolution field programmable gate array hardware accelerator MobileNetV2
title	A High-Performance FPGA-Based Depthwise Separable Convolution Accelerator
title_full	A High-Performance FPGA-Based Depthwise Separable Convolution Accelerator
title_fullStr	A High-Performance FPGA-Based Depthwise Separable Convolution Accelerator
title_full_unstemmed	A High-Performance FPGA-Based Depthwise Separable Convolution Accelerator
title_short	A High-Performance FPGA-Based Depthwise Separable Convolution Accelerator
title_sort	high performance fpga based depthwise separable convolution accelerator
topic	convolutional neural network depthwise separable convolution field programmable gate array hardware accelerator MobileNetV2
url	https://www.mdpi.com/2079-9292/12/7/1571
work_keys_str_mv	AT jiyehuang ahighperformancefpgabaseddepthwiseseparableconvolutionaccelerator AT xinliu ahighperformancefpgabaseddepthwiseseparableconvolutionaccelerator AT tongdongguo ahighperformancefpgabaseddepthwiseseparableconvolutionaccelerator AT zhijinzhao ahighperformancefpgabaseddepthwiseseparableconvolutionaccelerator AT jiyehuang highperformancefpgabaseddepthwiseseparableconvolutionaccelerator AT xinliu highperformancefpgabaseddepthwiseseparableconvolutionaccelerator AT tongdongguo highperformancefpgabaseddepthwiseseparableconvolutionaccelerator AT zhijinzhao highperformancefpgabaseddepthwiseseparableconvolutionaccelerator

A High-Performance FPGA-Based Depthwise Separable Convolution Accelerator

Similar Items