A Full Featured Configurable Accelerator for Object Detection With YOLO

Object detection and classification is an essential task of computer vision. A very efficient algorithm for detection and classification is YOLO (You Look Only Once). We consider hardware architectures to run YOLO in real-time on embedded platforms. Designing a new dedicated accelerator for each new...

Full description

Bibliographic Details
Main Authors: Daniel Pestana, Pedro R. Miranda, Joao D. Lopes, Rui P. Duarte, Mario P. Vestias, Horacio C. Neto, Jose T. De Sousa
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9435338/
_version_ 1818671343595421696
author Daniel Pestana
Pedro R. Miranda
Joao D. Lopes
Rui P. Duarte
Mario P. Vestias
Horacio C. Neto
Jose T. De Sousa
author_facet Daniel Pestana
Pedro R. Miranda
Joao D. Lopes
Rui P. Duarte
Mario P. Vestias
Horacio C. Neto
Jose T. De Sousa
author_sort Daniel Pestana
collection DOAJ
description Object detection and classification is an essential task of computer vision. A very efficient algorithm for detection and classification is YOLO (You Look Only Once). We consider hardware architectures to run YOLO in real-time on embedded platforms. Designing a new dedicated accelerator for each new version of YOLO is not feasible given the fast delivery of new versions. This work’s primary goal is to design a configurable and scalable core for creating specific object detection and classification systems based on YOLO, targeting embedded platforms. The core accelerates the execution of all the algorithm steps, including pre-processing, model inference and post-processing. It considers a fixed-point format, linearised activation functions, batch-normalisation, folding, and a hardware structure that exploits most of the available parallelism in CNN processing. The proposed core is configured for real-time execution of YOLOv3-Tiny and YOLOv4-Tiny, integrated into a RISC-V-based system-on-chip architecture and prototyped in an UltraScale XCKU040 FPGA (Field Programmable Gate Array). The solution achieves a performance of 32 and 31 frames per second for YOLOv3-Tiny and YOLOv4-Tiny, respectively, with a 16-bit fixed-point format. Compared to previous proposals, it improves the frame rate at a higher performance efficiency. The performance, area efficiency and configurability of the proposed core enable the fast development of real-time YOLO-based object detectors on embedded systems.
first_indexed 2024-12-17T07:22:30Z
format Article
id doaj.art-6769f844dca2477b83480a2b03ef65a3
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-17T07:22:30Z
publishDate 2021-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-6769f844dca2477b83480a2b03ef65a32022-12-21T21:58:43ZengIEEEIEEE Access2169-35362021-01-019758647587710.1109/ACCESS.2021.30818189435338A Full Featured Configurable Accelerator for Object Detection With YOLODaniel Pestana0Pedro R. Miranda1Joao D. Lopes2https://orcid.org/0000-0002-8903-9715Rui P. Duarte3https://orcid.org/0000-0002-7060-4745Mario P. Vestias4https://orcid.org/0000-0001-8556-4507Horacio C. Neto5https://orcid.org/0000-0002-3621-8322Jose T. De Sousa6https://orcid.org/0000-0001-7525-7546INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Lisboa, PortugalINESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Lisboa, PortugalINESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Lisboa, PortugalINESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Lisboa, PortugalINESC-ID, Instituto Superior de Engenharia de Lisboa, Instituto Politécnico de Lisboa, Lisboa, PortugalINESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Lisboa, PortugalINESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Lisboa, PortugalObject detection and classification is an essential task of computer vision. A very efficient algorithm for detection and classification is YOLO (You Look Only Once). We consider hardware architectures to run YOLO in real-time on embedded platforms. Designing a new dedicated accelerator for each new version of YOLO is not feasible given the fast delivery of new versions. This work’s primary goal is to design a configurable and scalable core for creating specific object detection and classification systems based on YOLO, targeting embedded platforms. The core accelerates the execution of all the algorithm steps, including pre-processing, model inference and post-processing. It considers a fixed-point format, linearised activation functions, batch-normalisation, folding, and a hardware structure that exploits most of the available parallelism in CNN processing. The proposed core is configured for real-time execution of YOLOv3-Tiny and YOLOv4-Tiny, integrated into a RISC-V-based system-on-chip architecture and prototyped in an UltraScale XCKU040 FPGA (Field Programmable Gate Array). The solution achieves a performance of 32 and 31 frames per second for YOLOv3-Tiny and YOLOv4-Tiny, respectively, with a 16-bit fixed-point format. Compared to previous proposals, it improves the frame rate at a higher performance efficiency. The performance, area efficiency and configurability of the proposed core enable the fast development of real-time YOLO-based object detectors on embedded systems.https://ieeexplore.ieee.org/document/9435338/Object detectionconvolutional neural networkFPGAlightweight YOLO
spellingShingle Daniel Pestana
Pedro R. Miranda
Joao D. Lopes
Rui P. Duarte
Mario P. Vestias
Horacio C. Neto
Jose T. De Sousa
A Full Featured Configurable Accelerator for Object Detection With YOLO
IEEE Access
Object detection
convolutional neural network
FPGA
lightweight YOLO
title A Full Featured Configurable Accelerator for Object Detection With YOLO
title_full A Full Featured Configurable Accelerator for Object Detection With YOLO
title_fullStr A Full Featured Configurable Accelerator for Object Detection With YOLO
title_full_unstemmed A Full Featured Configurable Accelerator for Object Detection With YOLO
title_short A Full Featured Configurable Accelerator for Object Detection With YOLO
title_sort full featured configurable accelerator for object detection with yolo
topic Object detection
convolutional neural network
FPGA
lightweight YOLO
url https://ieeexplore.ieee.org/document/9435338/
work_keys_str_mv AT danielpestana afullfeaturedconfigurableacceleratorforobjectdetectionwithyolo
AT pedrormiranda afullfeaturedconfigurableacceleratorforobjectdetectionwithyolo
AT joaodlopes afullfeaturedconfigurableacceleratorforobjectdetectionwithyolo
AT ruipduarte afullfeaturedconfigurableacceleratorforobjectdetectionwithyolo
AT mariopvestias afullfeaturedconfigurableacceleratorforobjectdetectionwithyolo
AT horaciocneto afullfeaturedconfigurableacceleratorforobjectdetectionwithyolo
AT josetdesousa afullfeaturedconfigurableacceleratorforobjectdetectionwithyolo
AT danielpestana fullfeaturedconfigurableacceleratorforobjectdetectionwithyolo
AT pedrormiranda fullfeaturedconfigurableacceleratorforobjectdetectionwithyolo
AT joaodlopes fullfeaturedconfigurableacceleratorforobjectdetectionwithyolo
AT ruipduarte fullfeaturedconfigurableacceleratorforobjectdetectionwithyolo
AT mariopvestias fullfeaturedconfigurableacceleratorforobjectdetectionwithyolo
AT horaciocneto fullfeaturedconfigurableacceleratorforobjectdetectionwithyolo
AT josetdesousa fullfeaturedconfigurableacceleratorforobjectdetectionwithyolo