L4Net: An anchor‐free generic object detector with attention mechanism for autonomous driving

Abstract Generic object detection is a crucial task for autonomous driving. To devise a safe and efficient object detector, the following aspects are required to be considered: high accuracy, real‐time inference speed and small model size. Herein, a simple yet effective anchor‐free object detector n...

Full description

Bibliographic Details
Main Authors: Yanan Wu, Songhe Feng, Xiankai Huang, Zizhang Wu
Format: Article
Language:English
Published: Wiley 2021-02-01
Series:IET Computer Vision
Online Access:https://doi.org/10.1049/cvi2.12015
_version_ 1811256279180509184
author Yanan Wu
Songhe Feng
Xiankai Huang
Zizhang Wu
author_facet Yanan Wu
Songhe Feng
Xiankai Huang
Zizhang Wu
author_sort Yanan Wu
collection DOAJ
description Abstract Generic object detection is a crucial task for autonomous driving. To devise a safe and efficient object detector, the following aspects are required to be considered: high accuracy, real‐time inference speed and small model size. Herein, a simple yet effective anchor‐free object detector named L4Net is proposed, which incorporates a keypoint detection backbone and a co‐attention scheme into a unified framework, and achieves lower computation cost with higher detection accuracy than prior art across a wide spectrum of resource constrains. Specifically, the backbone utilizes Multi‐scale Receptive‐fields Enhancement module (MRE) to capture context‐wise information, where the features of object scale and shape invariance are simultaneously considered. The co‐attention scheme integrates the strength of both Class‐agnostic Attention (CA) and Semantic Attention (SA), and explores the valuable features from low‐level to high‐level to generate more accurate prediction boxes. Compared with previous feature fusion strategy, multi‐scale features are selectively integrated by fully exploiting the different characteristics of low‐level and high‐level features, which leads to a small model size and faster inference speed. Extensive experiments on four well‐known datasets demonstrate the effectiveness of our method. For instance, L4Net achieves 71.68% mAP on KITTI test set, with 13.7 M model size at the speed of 149 FPS on NVIDIA TX and 30.7 FPS on Qualcomm‐based device, respectively, which is 4x smaller and 2x faster than baseline model.
first_indexed 2024-04-12T17:37:40Z
format Article
id doaj.art-94d0837e67fc4646bf379fa45538397e
institution Directory Open Access Journal
issn 1751-9632
1751-9640
language English
last_indexed 2024-04-12T17:37:40Z
publishDate 2021-02-01
publisher Wiley
record_format Article
series IET Computer Vision
spelling doaj.art-94d0837e67fc4646bf379fa45538397e2022-12-22T03:22:54ZengWileyIET Computer Vision1751-96321751-96402021-02-01151364610.1049/cvi2.12015L4Net: An anchor‐free generic object detector with attention mechanism for autonomous drivingYanan Wu0Songhe Feng1Xiankai Huang2Zizhang Wu3School of Computer and Information Technology Beijing Jiaotong University Beijing ChinaSchool of Computer and Information Technology Beijing Jiaotong University Beijing ChinaBeijing Technology and Business University Beijing ChinaZongMu Tech Shanghai ChinaAbstract Generic object detection is a crucial task for autonomous driving. To devise a safe and efficient object detector, the following aspects are required to be considered: high accuracy, real‐time inference speed and small model size. Herein, a simple yet effective anchor‐free object detector named L4Net is proposed, which incorporates a keypoint detection backbone and a co‐attention scheme into a unified framework, and achieves lower computation cost with higher detection accuracy than prior art across a wide spectrum of resource constrains. Specifically, the backbone utilizes Multi‐scale Receptive‐fields Enhancement module (MRE) to capture context‐wise information, where the features of object scale and shape invariance are simultaneously considered. The co‐attention scheme integrates the strength of both Class‐agnostic Attention (CA) and Semantic Attention (SA), and explores the valuable features from low‐level to high‐level to generate more accurate prediction boxes. Compared with previous feature fusion strategy, multi‐scale features are selectively integrated by fully exploiting the different characteristics of low‐level and high‐level features, which leads to a small model size and faster inference speed. Extensive experiments on four well‐known datasets demonstrate the effectiveness of our method. For instance, L4Net achieves 71.68% mAP on KITTI test set, with 13.7 M model size at the speed of 149 FPS on NVIDIA TX and 30.7 FPS on Qualcomm‐based device, respectively, which is 4x smaller and 2x faster than baseline model.https://doi.org/10.1049/cvi2.12015
spellingShingle Yanan Wu
Songhe Feng
Xiankai Huang
Zizhang Wu
L4Net: An anchor‐free generic object detector with attention mechanism for autonomous driving
IET Computer Vision
title L4Net: An anchor‐free generic object detector with attention mechanism for autonomous driving
title_full L4Net: An anchor‐free generic object detector with attention mechanism for autonomous driving
title_fullStr L4Net: An anchor‐free generic object detector with attention mechanism for autonomous driving
title_full_unstemmed L4Net: An anchor‐free generic object detector with attention mechanism for autonomous driving
title_short L4Net: An anchor‐free generic object detector with attention mechanism for autonomous driving
title_sort l4net an anchor free generic object detector with attention mechanism for autonomous driving
url https://doi.org/10.1049/cvi2.12015
work_keys_str_mv AT yananwu l4netananchorfreegenericobjectdetectorwithattentionmechanismforautonomousdriving
AT songhefeng l4netananchorfreegenericobjectdetectorwithattentionmechanismforautonomousdriving
AT xiankaihuang l4netananchorfreegenericobjectdetectorwithattentionmechanismforautonomousdriving
AT zizhangwu l4netananchorfreegenericobjectdetectorwithattentionmechanismforautonomousdriving