L4Net: An anchor‐free generic object detector with attention mechanism for autonomous driving

Abstract Generic object detection is a crucial task for autonomous driving. To devise a safe and efficient object detector, the following aspects are required to be considered: high accuracy, real‐time inference speed and small model size. Herein, a simple yet effective anchor‐free object detector n...

Full description

Bibliographic Details
Main Authors: Yanan Wu, Songhe Feng, Xiankai Huang, Zizhang Wu
Format: Article
Language:English
Published: Wiley 2021-02-01
Series:IET Computer Vision
Online Access:https://doi.org/10.1049/cvi2.12015
Description
Summary:Abstract Generic object detection is a crucial task for autonomous driving. To devise a safe and efficient object detector, the following aspects are required to be considered: high accuracy, real‐time inference speed and small model size. Herein, a simple yet effective anchor‐free object detector named L4Net is proposed, which incorporates a keypoint detection backbone and a co‐attention scheme into a unified framework, and achieves lower computation cost with higher detection accuracy than prior art across a wide spectrum of resource constrains. Specifically, the backbone utilizes Multi‐scale Receptive‐fields Enhancement module (MRE) to capture context‐wise information, where the features of object scale and shape invariance are simultaneously considered. The co‐attention scheme integrates the strength of both Class‐agnostic Attention (CA) and Semantic Attention (SA), and explores the valuable features from low‐level to high‐level to generate more accurate prediction boxes. Compared with previous feature fusion strategy, multi‐scale features are selectively integrated by fully exploiting the different characteristics of low‐level and high‐level features, which leads to a small model size and faster inference speed. Extensive experiments on four well‐known datasets demonstrate the effectiveness of our method. For instance, L4Net achieves 71.68% mAP on KITTI test set, with 13.7 M model size at the speed of 149 FPS on NVIDIA TX and 30.7 FPS on Qualcomm‐based device, respectively, which is 4x smaller and 2x faster than baseline model.
ISSN:1751-9632
1751-9640