A Low-Cost Fully Integer-Based CNN Accelerator on FPGA for Real-Time Traffic Sign Recognition

Traffic sign recognition (TSR) technology allows the vehicle to recognize road signs through a camera and use it for driving. For traffic safety, TSR is one of the core technologies constituting advanced driver assistance systems (ADAS), and several researches have been studied. The advent of convol...

Full description

Bibliographic Details
Main Authors: Jaemyung Kim, Jin-Ku Kang, Yongwoo Kim
Format: Article
Language:English
Published: IEEE 2022-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9853508/
_version_ 1811319848306737152
author Jaemyung Kim
Jin-Ku Kang
Yongwoo Kim
author_facet Jaemyung Kim
Jin-Ku Kang
Yongwoo Kim
author_sort Jaemyung Kim
collection DOAJ
description Traffic sign recognition (TSR) technology allows the vehicle to recognize road signs through a camera and use it for driving. For traffic safety, TSR is one of the core technologies constituting advanced driver assistance systems (ADAS), and several researches have been studied. The advent of convolutional neural networks (CNNs) has opened up new possibilities in automotive environments, especially for ADAS. However, deploying a real-time TSR application in resource-constrained ADAS is challenging because most CNNs require high computing resources and memory usage. To address this problem, some works have been studied to consider optimization in embedded platforms, but existing works used many hardware resources or showed low computation performance. In this paper, we propose a low-cost CNN-based real-time TSR hardware accelerator. Firstly, we extend a novel hardware-friendly quantization method to reduce computational complexity. The quantization method can reconstruct the CNN so that all operations, including the skip connection path of residual blocks, use only integer arithmetic and reduce the computational overhead by replacing the quantization affine mapping process with a shift operation. Secondly, the proposed hardware accelerator applied two parallelization strategies to balance real-time inference and resource consumption. In addition, we present a simple and effective hardware design scheme that handles the skip connection path of residual blocks. This design scheme can optimize the dataflow of the skip connection path and reduce additional internal memory usage. Experimental results show that the reconstructed fully integer-based CNN only requires 24M integer operations (IOPs) and possesses a model size of 0.17MB. Compared with the previous work, the proposed CNN model size was reduced by <inline-formula> <tex-math notation="LaTeX">$\times 105$ </tex-math></inline-formula>, and the number of operations was reduced by <inline-formula> <tex-math notation="LaTeX">$\times 58$ </tex-math></inline-formula>. In addition, the proposed CNN can achieve a TSR accuracy of 99.07&#x0025;, which is the highest accuracy among CNN-based TSR works implemented on embedded platforms. The proposed hardware accelerator achieves a computation performance of 960 MOPS and a frame rate of 40 FPS when implemented on a Xilinx ZC706 SoC. Consequently, this work improves by <inline-formula> <tex-math notation="LaTeX">$\times 11.87$ </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">$\times 36.7$ </tex-math></inline-formula> on computation performance and frame rate compared to the previous work.
first_indexed 2024-04-13T12:50:30Z
format Article
id doaj.art-b568c0e30ffe47b38b863171dd455b2f
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-13T12:50:30Z
publishDate 2022-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-b568c0e30ffe47b38b863171dd455b2f2022-12-22T02:46:15ZengIEEEIEEE Access2169-35362022-01-0110846268463410.1109/ACCESS.2022.31979069853508A Low-Cost Fully Integer-Based CNN Accelerator on FPGA for Real-Time Traffic Sign RecognitionJaemyung Kim0https://orcid.org/0000-0001-8448-9680Jin-Ku Kang1https://orcid.org/0000-0002-3752-3740Yongwoo Kim2https://orcid.org/0000-0002-1011-2319Department of Electrical and Computer Engineering, Inha University, Incheon, South KoreaDepartment of Electrical and Computer Engineering, Inha University, Incheon, South KoreaDepartment of System Semiconductor Engineering, Sangmyung University, Cheonan, South KoreaTraffic sign recognition (TSR) technology allows the vehicle to recognize road signs through a camera and use it for driving. For traffic safety, TSR is one of the core technologies constituting advanced driver assistance systems (ADAS), and several researches have been studied. The advent of convolutional neural networks (CNNs) has opened up new possibilities in automotive environments, especially for ADAS. However, deploying a real-time TSR application in resource-constrained ADAS is challenging because most CNNs require high computing resources and memory usage. To address this problem, some works have been studied to consider optimization in embedded platforms, but existing works used many hardware resources or showed low computation performance. In this paper, we propose a low-cost CNN-based real-time TSR hardware accelerator. Firstly, we extend a novel hardware-friendly quantization method to reduce computational complexity. The quantization method can reconstruct the CNN so that all operations, including the skip connection path of residual blocks, use only integer arithmetic and reduce the computational overhead by replacing the quantization affine mapping process with a shift operation. Secondly, the proposed hardware accelerator applied two parallelization strategies to balance real-time inference and resource consumption. In addition, we present a simple and effective hardware design scheme that handles the skip connection path of residual blocks. This design scheme can optimize the dataflow of the skip connection path and reduce additional internal memory usage. Experimental results show that the reconstructed fully integer-based CNN only requires 24M integer operations (IOPs) and possesses a model size of 0.17MB. Compared with the previous work, the proposed CNN model size was reduced by <inline-formula> <tex-math notation="LaTeX">$\times 105$ </tex-math></inline-formula>, and the number of operations was reduced by <inline-formula> <tex-math notation="LaTeX">$\times 58$ </tex-math></inline-formula>. In addition, the proposed CNN can achieve a TSR accuracy of 99.07&#x0025;, which is the highest accuracy among CNN-based TSR works implemented on embedded platforms. The proposed hardware accelerator achieves a computation performance of 960 MOPS and a frame rate of 40 FPS when implemented on a Xilinx ZC706 SoC. Consequently, this work improves by <inline-formula> <tex-math notation="LaTeX">$\times 11.87$ </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">$\times 36.7$ </tex-math></inline-formula> on computation performance and frame rate compared to the previous work.https://ieeexplore.ieee.org/document/9853508/Traffic sign recognitionCNNquantizationacceleratorFPGA
spellingShingle Jaemyung Kim
Jin-Ku Kang
Yongwoo Kim
A Low-Cost Fully Integer-Based CNN Accelerator on FPGA for Real-Time Traffic Sign Recognition
IEEE Access
Traffic sign recognition
CNN
quantization
accelerator
FPGA
title A Low-Cost Fully Integer-Based CNN Accelerator on FPGA for Real-Time Traffic Sign Recognition
title_full A Low-Cost Fully Integer-Based CNN Accelerator on FPGA for Real-Time Traffic Sign Recognition
title_fullStr A Low-Cost Fully Integer-Based CNN Accelerator on FPGA for Real-Time Traffic Sign Recognition
title_full_unstemmed A Low-Cost Fully Integer-Based CNN Accelerator on FPGA for Real-Time Traffic Sign Recognition
title_short A Low-Cost Fully Integer-Based CNN Accelerator on FPGA for Real-Time Traffic Sign Recognition
title_sort low cost fully integer based cnn accelerator on fpga for real time traffic sign recognition
topic Traffic sign recognition
CNN
quantization
accelerator
FPGA
url https://ieeexplore.ieee.org/document/9853508/
work_keys_str_mv AT jaemyungkim alowcostfullyintegerbasedcnnacceleratoronfpgaforrealtimetrafficsignrecognition
AT jinkukang alowcostfullyintegerbasedcnnacceleratoronfpgaforrealtimetrafficsignrecognition
AT yongwookim alowcostfullyintegerbasedcnnacceleratoronfpgaforrealtimetrafficsignrecognition
AT jaemyungkim lowcostfullyintegerbasedcnnacceleratoronfpgaforrealtimetrafficsignrecognition
AT jinkukang lowcostfullyintegerbasedcnnacceleratoronfpgaforrealtimetrafficsignrecognition
AT yongwookim lowcostfullyintegerbasedcnnacceleratoronfpgaforrealtimetrafficsignrecognition