A Reconfigurable CNN-Based Accelerator Design for Fast and Energy-Efficient Object Detection System on Mobile FPGA

In limited-resource edge computing circumstances such as on mobile devices, IoT devices, and electric vehicles, the energy-efficient optimized convolutional neural network (CNN) accelerator implemented on mobile Field Programmable Gate Array (FPGA) is becoming more attractive due to its high accurac...

Full description

Bibliographic Details
Main Authors: Victoria Heekyung Kim, Kyuwon Ken Choi
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10148988/
_version_ 1797799300495835136
author Victoria Heekyung Kim
Kyuwon Ken Choi
author_facet Victoria Heekyung Kim
Kyuwon Ken Choi
author_sort Victoria Heekyung Kim
collection DOAJ
description In limited-resource edge computing circumstances such as on mobile devices, IoT devices, and electric vehicles, the energy-efficient optimized convolutional neural network (CNN) accelerator implemented on mobile Field Programmable Gate Array (FPGA) is becoming more attractive due to its high accuracy and scalability. In recent days, mobile FPGAs such as the Xilinx PYNQ-Z1/Z2 and Ultra96, definitely have the advantage of scalability and flexibility for the implementation of deep learning algorithm-based object detection applications. It is also suitable for battery-powered systems, especially for drones and electric vehicles, to achieve energy efficiency in terms of power consumption and size aspect. However, it has the low and limited performance to achieve real-time processing. In this article, optimizing the accelerator design flow in the register-transfer level (RTL) will be introduced to achieve fast programming speed by applying low-power techniques on FPGA accelerator implementation. In general, most accelerator optimization techniques are conducted on the system level on the FPGA. In this article, we propose the reconfigurable accelerator design for a CNN-based object detection system on the register-transfer level on mobile FPGA. Furthermore, we present RTL optimization design techniques that will be applied such as various types of clock gating techniques to eliminate residual signals and to deactivate the unnecessarily active block. Based on the analysis of the CNN-based object detection architecture, we analyze and classify the common computing operation components from the Convolutional Neuron Network, such as multipliers and adders. We implement a multiplier/adder unit to a universal computing unit and modularize it to be suitable for a hierarchical structure of RTL code. The proposed system design was tested with Resnet-20 which has 23 layers and it was trained with the dataset, CIFAR-10 which provides a test set of 10,000 images in several formats, and the weight data we used for this experiment was provided from Tensil. Experimental results show that the proposed design process improves the power efficient consumption, hardware utilization, and throughput by 16%, up to 58%, and 15%, respectively.
first_indexed 2024-03-13T04:17:53Z
format Article
id doaj.art-8dd016694e2544e6956b081048872470
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-03-13T04:17:53Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-8dd016694e2544e6956b0810488724702023-06-20T23:00:39ZengIEEEIEEE Access2169-35362023-01-0111594385944510.1109/ACCESS.2023.328527910148988A Reconfigurable CNN-Based Accelerator Design for Fast and Energy-Efficient Object Detection System on Mobile FPGAVictoria Heekyung Kim0https://orcid.org/0000-0002-8543-0792Kyuwon Ken Choi1Department of Electrical and Computer Engineering, Illinois Institute of Technology, Chicago, IL, USADepartment of Electrical and Computer Engineering, Illinois Institute of Technology, Chicago, IL, USAIn limited-resource edge computing circumstances such as on mobile devices, IoT devices, and electric vehicles, the energy-efficient optimized convolutional neural network (CNN) accelerator implemented on mobile Field Programmable Gate Array (FPGA) is becoming more attractive due to its high accuracy and scalability. In recent days, mobile FPGAs such as the Xilinx PYNQ-Z1/Z2 and Ultra96, definitely have the advantage of scalability and flexibility for the implementation of deep learning algorithm-based object detection applications. It is also suitable for battery-powered systems, especially for drones and electric vehicles, to achieve energy efficiency in terms of power consumption and size aspect. However, it has the low and limited performance to achieve real-time processing. In this article, optimizing the accelerator design flow in the register-transfer level (RTL) will be introduced to achieve fast programming speed by applying low-power techniques on FPGA accelerator implementation. In general, most accelerator optimization techniques are conducted on the system level on the FPGA. In this article, we propose the reconfigurable accelerator design for a CNN-based object detection system on the register-transfer level on mobile FPGA. Furthermore, we present RTL optimization design techniques that will be applied such as various types of clock gating techniques to eliminate residual signals and to deactivate the unnecessarily active block. Based on the analysis of the CNN-based object detection architecture, we analyze and classify the common computing operation components from the Convolutional Neuron Network, such as multipliers and adders. We implement a multiplier/adder unit to a universal computing unit and modularize it to be suitable for a hierarchical structure of RTL code. The proposed system design was tested with Resnet-20 which has 23 layers and it was trained with the dataset, CIFAR-10 which provides a test set of 10,000 images in several formats, and the weight data we used for this experiment was provided from Tensil. Experimental results show that the proposed design process improves the power efficient consumption, hardware utilization, and throughput by 16%, up to 58%, and 15%, respectively.https://ieeexplore.ieee.org/document/10148988/FPGA acceleratorCNN acceleratorRT level design techniqueslow power techniquesreconfigurable acceleratorCNN-based object detection
spellingShingle Victoria Heekyung Kim
Kyuwon Ken Choi
A Reconfigurable CNN-Based Accelerator Design for Fast and Energy-Efficient Object Detection System on Mobile FPGA
IEEE Access
FPGA accelerator
CNN accelerator
RT level design techniques
low power techniques
reconfigurable accelerator
CNN-based object detection
title A Reconfigurable CNN-Based Accelerator Design for Fast and Energy-Efficient Object Detection System on Mobile FPGA
title_full A Reconfigurable CNN-Based Accelerator Design for Fast and Energy-Efficient Object Detection System on Mobile FPGA
title_fullStr A Reconfigurable CNN-Based Accelerator Design for Fast and Energy-Efficient Object Detection System on Mobile FPGA
title_full_unstemmed A Reconfigurable CNN-Based Accelerator Design for Fast and Energy-Efficient Object Detection System on Mobile FPGA
title_short A Reconfigurable CNN-Based Accelerator Design for Fast and Energy-Efficient Object Detection System on Mobile FPGA
title_sort reconfigurable cnn based accelerator design for fast and energy efficient object detection system on mobile fpga
topic FPGA accelerator
CNN accelerator
RT level design techniques
low power techniques
reconfigurable accelerator
CNN-based object detection
url https://ieeexplore.ieee.org/document/10148988/
work_keys_str_mv AT victoriaheekyungkim areconfigurablecnnbasedacceleratordesignforfastandenergyefficientobjectdetectionsystemonmobilefpga
AT kyuwonkenchoi areconfigurablecnnbasedacceleratordesignforfastandenergyefficientobjectdetectionsystemonmobilefpga
AT victoriaheekyungkim reconfigurablecnnbasedacceleratordesignforfastandenergyefficientobjectdetectionsystemonmobilefpga
AT kyuwonkenchoi reconfigurablecnnbasedacceleratordesignforfastandenergyefficientobjectdetectionsystemonmobilefpga