A digital signal processor‐efficient accelerator for depthwise separable convolution

Abstract Recent researches on deep convolution neural networks have proposed some compact networks, such as MobileNet, but its main computation, depthwise separable convolution (DWC), which reduces the reusable data and improves the requirement of data loading efficiency. Although DWC can effectivel...

Full description

Bibliographic Details
Main Authors: Xueming Li, Hongmin Huang, Yuan Liu, Xianghong Hu, Xiaoming Xiong
Format: Article
Language:English
Published: Wiley 2022-03-01
Series:Electronics Letters
Online Access:https://doi.org/10.1049/ell2.12435
_version_ 1811241803556323328
author Xueming Li
Hongmin Huang
Yuan Liu
Xianghong Hu
Xiaoming Xiong
author_facet Xueming Li
Hongmin Huang
Yuan Liu
Xianghong Hu
Xiaoming Xiong
author_sort Xueming Li
collection DOAJ
description Abstract Recent researches on deep convolution neural networks have proposed some compact networks, such as MobileNet, but its main computation, depthwise separable convolution (DWC), which reduces the reusable data and improves the requirement of data loading efficiency. Although DWC can effectively reduce the amount of network computation, it needs a special accelerator to enhance the inference speed. This paper proposes a high‐performance accelerator for DWC based on the commonly used acceleration platform field‐programmable gate array. The proposed accelerator supports the computation of both standard convolutions (SCs) and DWC as well as two activation functions. In addition, two data storage formats are used to maintain the data loading efficiency for different input requirements of SC and DWC under high parallelism. Furthermore, a processing unit that can execute two 8 × 8‐bit multiplications inside one digital signal processor (DSP) is designed to make the best use of the DSP hardware resources. Finally, the accelerator is implemented on ZYNQ ZC706 at 200 MHz. Consuming only 392 DSPs, the accelerator achieves 134.5 giga operations per second (GOPS) and 209.4 frames per second (FPS) on MobileNet V1 as well as 96.4 GOPS and 250.4 FPS on MobileNet V2. Experimental results demonstrate that this design provides a better DSP efficiency than previous works.
first_indexed 2024-04-12T13:42:35Z
format Article
id doaj.art-fac09823ce1246dd9436f3ec67b326ef
institution Directory Open Access Journal
issn 0013-5194
1350-911X
language English
last_indexed 2024-04-12T13:42:35Z
publishDate 2022-03-01
publisher Wiley
record_format Article
series Electronics Letters
spelling doaj.art-fac09823ce1246dd9436f3ec67b326ef2022-12-22T03:30:49ZengWileyElectronics Letters0013-51941350-911X2022-03-0158727127310.1049/ell2.12435A digital signal processor‐efficient accelerator for depthwise separable convolutionXueming Li0Hongmin Huang1Yuan Liu2Xianghong Hu3Xiaoming Xiong4School of Automation Guangdong University of Technology Guangzhou ChinaSchool of Automation Guangdong University of Technology Guangzhou ChinaSchool of Microelectronics Guangdong University of Technology Guangzhou ChinaSchool of Microelectronics Guangdong University of Technology Guangzhou ChinaSchool of Microelectronics Guangdong University of Technology Guangzhou ChinaAbstract Recent researches on deep convolution neural networks have proposed some compact networks, such as MobileNet, but its main computation, depthwise separable convolution (DWC), which reduces the reusable data and improves the requirement of data loading efficiency. Although DWC can effectively reduce the amount of network computation, it needs a special accelerator to enhance the inference speed. This paper proposes a high‐performance accelerator for DWC based on the commonly used acceleration platform field‐programmable gate array. The proposed accelerator supports the computation of both standard convolutions (SCs) and DWC as well as two activation functions. In addition, two data storage formats are used to maintain the data loading efficiency for different input requirements of SC and DWC under high parallelism. Furthermore, a processing unit that can execute two 8 × 8‐bit multiplications inside one digital signal processor (DSP) is designed to make the best use of the DSP hardware resources. Finally, the accelerator is implemented on ZYNQ ZC706 at 200 MHz. Consuming only 392 DSPs, the accelerator achieves 134.5 giga operations per second (GOPS) and 209.4 frames per second (FPS) on MobileNet V1 as well as 96.4 GOPS and 250.4 FPS on MobileNet V2. Experimental results demonstrate that this design provides a better DSP efficiency than previous works.https://doi.org/10.1049/ell2.12435
spellingShingle Xueming Li
Hongmin Huang
Yuan Liu
Xianghong Hu
Xiaoming Xiong
A digital signal processor‐efficient accelerator for depthwise separable convolution
Electronics Letters
title A digital signal processor‐efficient accelerator for depthwise separable convolution
title_full A digital signal processor‐efficient accelerator for depthwise separable convolution
title_fullStr A digital signal processor‐efficient accelerator for depthwise separable convolution
title_full_unstemmed A digital signal processor‐efficient accelerator for depthwise separable convolution
title_short A digital signal processor‐efficient accelerator for depthwise separable convolution
title_sort digital signal processor efficient accelerator for depthwise separable convolution
url https://doi.org/10.1049/ell2.12435
work_keys_str_mv AT xuemingli adigitalsignalprocessorefficientacceleratorfordepthwiseseparableconvolution
AT hongminhuang adigitalsignalprocessorefficientacceleratorfordepthwiseseparableconvolution
AT yuanliu adigitalsignalprocessorefficientacceleratorfordepthwiseseparableconvolution
AT xianghonghu adigitalsignalprocessorefficientacceleratorfordepthwiseseparableconvolution
AT xiaomingxiong adigitalsignalprocessorefficientacceleratorfordepthwiseseparableconvolution
AT xuemingli digitalsignalprocessorefficientacceleratorfordepthwiseseparableconvolution
AT hongminhuang digitalsignalprocessorefficientacceleratorfordepthwiseseparableconvolution
AT yuanliu digitalsignalprocessorefficientacceleratorfordepthwiseseparableconvolution
AT xianghonghu digitalsignalprocessorefficientacceleratorfordepthwiseseparableconvolution
AT xiaomingxiong digitalsignalprocessorefficientacceleratorfordepthwiseseparableconvolution