A Reconfigurable CNN-Based Accelerator Design for Fast and Energy-Efficient Object Detection System on Mobile FPGA

In limited-resource edge computing circumstances such as on mobile devices, IoT devices, and electric vehicles, the energy-efficient optimized convolutional neural network (CNN) accelerator implemented on mobile Field Programmable Gate Array (FPGA) is becoming more attractive due to its high accurac...

Full description

Bibliographic Details
Main Authors:	Victoria Heekyung Kim, Kyuwon Ken Choi
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Access
Subjects:	FPGA accelerator CNN accelerator RT level design techniques low power techniques reconfigurable accelerator CNN-based object detection
Online Access:	https://ieeexplore.ieee.org/document/10148988/

_version_	1797799300495835136
author	Victoria Heekyung Kim Kyuwon Ken Choi
author_facet	Victoria Heekyung Kim Kyuwon Ken Choi
author_sort	Victoria Heekyung Kim
collection	DOAJ
description	In limited-resource edge computing circumstances such as on mobile devices, IoT devices, and electric vehicles, the energy-efficient optimized convolutional neural network (CNN) accelerator implemented on mobile Field Programmable Gate Array (FPGA) is becoming more attractive due to its high accuracy and scalability. In recent days, mobile FPGAs such as the Xilinx PYNQ-Z1/Z2 and Ultra96, definitely have the advantage of scalability and flexibility for the implementation of deep learning algorithm-based object detection applications. It is also suitable for battery-powered systems, especially for drones and electric vehicles, to achieve energy efficiency in terms of power consumption and size aspect. However, it has the low and limited performance to achieve real-time processing. In this article, optimizing the accelerator design flow in the register-transfer level (RTL) will be introduced to achieve fast programming speed by applying low-power techniques on FPGA accelerator implementation. In general, most accelerator optimization techniques are conducted on the system level on the FPGA. In this article, we propose the reconfigurable accelerator design for a CNN-based object detection system on the register-transfer level on mobile FPGA. Furthermore, we present RTL optimization design techniques that will be applied such as various types of clock gating techniques to eliminate residual signals and to deactivate the unnecessarily active block. Based on the analysis of the CNN-based object detection architecture, we analyze and classify the common computing operation components from the Convolutional Neuron Network, such as multipliers and adders. We implement a multiplier/adder unit to a universal computing unit and modularize it to be suitable for a hierarchical structure of RTL code. The proposed system design was tested with Resnet-20 which has 23 layers and it was trained with the dataset, CIFAR-10 which provides a test set of 10,000 images in several formats, and the weight data we used for this experiment was provided from Tensil. Experimental results show that the proposed design process improves the power efficient consumption, hardware utilization, and throughput by 16%, up to 58%, and 15%, respectively.
first_indexed	2024-03-13T04:17:53Z
format	Article
id	doaj.art-8dd016694e2544e6956b081048872470
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-03-13T04:17:53Z
publishDate	2023-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-8dd016694e2544e6956b0810488724702023-06-20T23:00:39ZengIEEEIEEE Access2169-35362023-01-0111594385944510.1109/ACCESS.2023.328527910148988A Reconfigurable CNN-Based Accelerator Design for Fast and Energy-Efficient Object Detection System on Mobile FPGAVictoria Heekyung Kim0https://orcid.org/0000-0002-8543-0792Kyuwon Ken Choi1Department of Electrical and Computer Engineering, Illinois Institute of Technology, Chicago, IL, USADepartment of Electrical and Computer Engineering, Illinois Institute of Technology, Chicago, IL, USAIn limited-resource edge computing circumstances such as on mobile devices, IoT devices, and electric vehicles, the energy-efficient optimized convolutional neural network (CNN) accelerator implemented on mobile Field Programmable Gate Array (FPGA) is becoming more attractive due to its high accuracy and scalability. In recent days, mobile FPGAs such as the Xilinx PYNQ-Z1/Z2 and Ultra96, definitely have the advantage of scalability and flexibility for the implementation of deep learning algorithm-based object detection applications. It is also suitable for battery-powered systems, especially for drones and electric vehicles, to achieve energy efficiency in terms of power consumption and size aspect. However, it has the low and limited performance to achieve real-time processing. In this article, optimizing the accelerator design flow in the register-transfer level (RTL) will be introduced to achieve fast programming speed by applying low-power techniques on FPGA accelerator implementation. In general, most accelerator optimization techniques are conducted on the system level on the FPGA. In this article, we propose the reconfigurable accelerator design for a CNN-based object detection system on the register-transfer level on mobile FPGA. Furthermore, we present RTL optimization design techniques that will be applied such as various types of clock gating techniques to eliminate residual signals and to deactivate the unnecessarily active block. Based on the analysis of the CNN-based object detection architecture, we analyze and classify the common computing operation components from the Convolutional Neuron Network, such as multipliers and adders. We implement a multiplier/adder unit to a universal computing unit and modularize it to be suitable for a hierarchical structure of RTL code. The proposed system design was tested with Resnet-20 which has 23 layers and it was trained with the dataset, CIFAR-10 which provides a test set of 10,000 images in several formats, and the weight data we used for this experiment was provided from Tensil. Experimental results show that the proposed design process improves the power efficient consumption, hardware utilization, and throughput by 16%, up to 58%, and 15%, respectively.https://ieeexplore.ieee.org/document/10148988/FPGA acceleratorCNN acceleratorRT level design techniqueslow power techniquesreconfigurable acceleratorCNN-based object detection
spellingShingle	Victoria Heekyung Kim Kyuwon Ken Choi A Reconfigurable CNN-Based Accelerator Design for Fast and Energy-Efficient Object Detection System on Mobile FPGA IEEE Access FPGA accelerator CNN accelerator RT level design techniques low power techniques reconfigurable accelerator CNN-based object detection
title	A Reconfigurable CNN-Based Accelerator Design for Fast and Energy-Efficient Object Detection System on Mobile FPGA
title_full	A Reconfigurable CNN-Based Accelerator Design for Fast and Energy-Efficient Object Detection System on Mobile FPGA
title_fullStr	A Reconfigurable CNN-Based Accelerator Design for Fast and Energy-Efficient Object Detection System on Mobile FPGA
title_full_unstemmed	A Reconfigurable CNN-Based Accelerator Design for Fast and Energy-Efficient Object Detection System on Mobile FPGA
title_short	A Reconfigurable CNN-Based Accelerator Design for Fast and Energy-Efficient Object Detection System on Mobile FPGA
title_sort	reconfigurable cnn based accelerator design for fast and energy efficient object detection system on mobile fpga
topic	FPGA accelerator CNN accelerator RT level design techniques low power techniques reconfigurable accelerator CNN-based object detection
url	https://ieeexplore.ieee.org/document/10148988/
work_keys_str_mv	AT victoriaheekyungkim areconfigurablecnnbasedacceleratordesignforfastandenergyefficientobjectdetectionsystemonmobilefpga AT kyuwonkenchoi areconfigurablecnnbasedacceleratordesignforfastandenergyefficientobjectdetectionsystemonmobilefpga AT victoriaheekyungkim reconfigurablecnnbasedacceleratordesignforfastandenergyefficientobjectdetectionsystemonmobilefpga AT kyuwonkenchoi reconfigurablecnnbasedacceleratordesignforfastandenergyefficientobjectdetectionsystemonmobilefpga

A Reconfigurable CNN-Based Accelerator Design for Fast and Energy-Efficient Object Detection System on Mobile FPGA

Similar Items