A Reconfigurable CNN-Based Accelerator Design for Fast and Energy-Efficient Object Detection System on Mobile FPGA
In limited-resource edge computing circumstances such as on mobile devices, IoT devices, and electric vehicles, the energy-efficient optimized convolutional neural network (CNN) accelerator implemented on mobile Field Programmable Gate Array (FPGA) is becoming more attractive due to its high accurac...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2023-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10148988/ |
_version_ | 1797799300495835136 |
---|---|
author | Victoria Heekyung Kim Kyuwon Ken Choi |
author_facet | Victoria Heekyung Kim Kyuwon Ken Choi |
author_sort | Victoria Heekyung Kim |
collection | DOAJ |
description | In limited-resource edge computing circumstances such as on mobile devices, IoT devices, and electric vehicles, the energy-efficient optimized convolutional neural network (CNN) accelerator implemented on mobile Field Programmable Gate Array (FPGA) is becoming more attractive due to its high accuracy and scalability. In recent days, mobile FPGAs such as the Xilinx PYNQ-Z1/Z2 and Ultra96, definitely have the advantage of scalability and flexibility for the implementation of deep learning algorithm-based object detection applications. It is also suitable for battery-powered systems, especially for drones and electric vehicles, to achieve energy efficiency in terms of power consumption and size aspect. However, it has the low and limited performance to achieve real-time processing. In this article, optimizing the accelerator design flow in the register-transfer level (RTL) will be introduced to achieve fast programming speed by applying low-power techniques on FPGA accelerator implementation. In general, most accelerator optimization techniques are conducted on the system level on the FPGA. In this article, we propose the reconfigurable accelerator design for a CNN-based object detection system on the register-transfer level on mobile FPGA. Furthermore, we present RTL optimization design techniques that will be applied such as various types of clock gating techniques to eliminate residual signals and to deactivate the unnecessarily active block. Based on the analysis of the CNN-based object detection architecture, we analyze and classify the common computing operation components from the Convolutional Neuron Network, such as multipliers and adders. We implement a multiplier/adder unit to a universal computing unit and modularize it to be suitable for a hierarchical structure of RTL code. The proposed system design was tested with Resnet-20 which has 23 layers and it was trained with the dataset, CIFAR-10 which provides a test set of 10,000 images in several formats, and the weight data we used for this experiment was provided from Tensil. Experimental results show that the proposed design process improves the power efficient consumption, hardware utilization, and throughput by 16%, up to 58%, and 15%, respectively. |
first_indexed | 2024-03-13T04:17:53Z |
format | Article |
id | doaj.art-8dd016694e2544e6956b081048872470 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-03-13T04:17:53Z |
publishDate | 2023-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-8dd016694e2544e6956b0810488724702023-06-20T23:00:39ZengIEEEIEEE Access2169-35362023-01-0111594385944510.1109/ACCESS.2023.328527910148988A Reconfigurable CNN-Based Accelerator Design for Fast and Energy-Efficient Object Detection System on Mobile FPGAVictoria Heekyung Kim0https://orcid.org/0000-0002-8543-0792Kyuwon Ken Choi1Department of Electrical and Computer Engineering, Illinois Institute of Technology, Chicago, IL, USADepartment of Electrical and Computer Engineering, Illinois Institute of Technology, Chicago, IL, USAIn limited-resource edge computing circumstances such as on mobile devices, IoT devices, and electric vehicles, the energy-efficient optimized convolutional neural network (CNN) accelerator implemented on mobile Field Programmable Gate Array (FPGA) is becoming more attractive due to its high accuracy and scalability. In recent days, mobile FPGAs such as the Xilinx PYNQ-Z1/Z2 and Ultra96, definitely have the advantage of scalability and flexibility for the implementation of deep learning algorithm-based object detection applications. It is also suitable for battery-powered systems, especially for drones and electric vehicles, to achieve energy efficiency in terms of power consumption and size aspect. However, it has the low and limited performance to achieve real-time processing. In this article, optimizing the accelerator design flow in the register-transfer level (RTL) will be introduced to achieve fast programming speed by applying low-power techniques on FPGA accelerator implementation. In general, most accelerator optimization techniques are conducted on the system level on the FPGA. In this article, we propose the reconfigurable accelerator design for a CNN-based object detection system on the register-transfer level on mobile FPGA. Furthermore, we present RTL optimization design techniques that will be applied such as various types of clock gating techniques to eliminate residual signals and to deactivate the unnecessarily active block. Based on the analysis of the CNN-based object detection architecture, we analyze and classify the common computing operation components from the Convolutional Neuron Network, such as multipliers and adders. We implement a multiplier/adder unit to a universal computing unit and modularize it to be suitable for a hierarchical structure of RTL code. The proposed system design was tested with Resnet-20 which has 23 layers and it was trained with the dataset, CIFAR-10 which provides a test set of 10,000 images in several formats, and the weight data we used for this experiment was provided from Tensil. Experimental results show that the proposed design process improves the power efficient consumption, hardware utilization, and throughput by 16%, up to 58%, and 15%, respectively.https://ieeexplore.ieee.org/document/10148988/FPGA acceleratorCNN acceleratorRT level design techniqueslow power techniquesreconfigurable acceleratorCNN-based object detection |
spellingShingle | Victoria Heekyung Kim Kyuwon Ken Choi A Reconfigurable CNN-Based Accelerator Design for Fast and Energy-Efficient Object Detection System on Mobile FPGA IEEE Access FPGA accelerator CNN accelerator RT level design techniques low power techniques reconfigurable accelerator CNN-based object detection |
title | A Reconfigurable CNN-Based Accelerator Design for Fast and Energy-Efficient Object Detection System on Mobile FPGA |
title_full | A Reconfigurable CNN-Based Accelerator Design for Fast and Energy-Efficient Object Detection System on Mobile FPGA |
title_fullStr | A Reconfigurable CNN-Based Accelerator Design for Fast and Energy-Efficient Object Detection System on Mobile FPGA |
title_full_unstemmed | A Reconfigurable CNN-Based Accelerator Design for Fast and Energy-Efficient Object Detection System on Mobile FPGA |
title_short | A Reconfigurable CNN-Based Accelerator Design for Fast and Energy-Efficient Object Detection System on Mobile FPGA |
title_sort | reconfigurable cnn based accelerator design for fast and energy efficient object detection system on mobile fpga |
topic | FPGA accelerator CNN accelerator RT level design techniques low power techniques reconfigurable accelerator CNN-based object detection |
url | https://ieeexplore.ieee.org/document/10148988/ |
work_keys_str_mv | AT victoriaheekyungkim areconfigurablecnnbasedacceleratordesignforfastandenergyefficientobjectdetectionsystemonmobilefpga AT kyuwonkenchoi areconfigurablecnnbasedacceleratordesignforfastandenergyefficientobjectdetectionsystemonmobilefpga AT victoriaheekyungkim reconfigurablecnnbasedacceleratordesignforfastandenergyefficientobjectdetectionsystemonmobilefpga AT kyuwonkenchoi reconfigurablecnnbasedacceleratordesignforfastandenergyefficientobjectdetectionsystemonmobilefpga |