L4Net: An anchor‐free generic object detector with attention mechanism for autonomous driving
Abstract Generic object detection is a crucial task for autonomous driving. To devise a safe and efficient object detector, the following aspects are required to be considered: high accuracy, real‐time inference speed and small model size. Herein, a simple yet effective anchor‐free object detector n...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2021-02-01
|
Series: | IET Computer Vision |
Online Access: | https://doi.org/10.1049/cvi2.12015 |
_version_ | 1811256279180509184 |
---|---|
author | Yanan Wu Songhe Feng Xiankai Huang Zizhang Wu |
author_facet | Yanan Wu Songhe Feng Xiankai Huang Zizhang Wu |
author_sort | Yanan Wu |
collection | DOAJ |
description | Abstract Generic object detection is a crucial task for autonomous driving. To devise a safe and efficient object detector, the following aspects are required to be considered: high accuracy, real‐time inference speed and small model size. Herein, a simple yet effective anchor‐free object detector named L4Net is proposed, which incorporates a keypoint detection backbone and a co‐attention scheme into a unified framework, and achieves lower computation cost with higher detection accuracy than prior art across a wide spectrum of resource constrains. Specifically, the backbone utilizes Multi‐scale Receptive‐fields Enhancement module (MRE) to capture context‐wise information, where the features of object scale and shape invariance are simultaneously considered. The co‐attention scheme integrates the strength of both Class‐agnostic Attention (CA) and Semantic Attention (SA), and explores the valuable features from low‐level to high‐level to generate more accurate prediction boxes. Compared with previous feature fusion strategy, multi‐scale features are selectively integrated by fully exploiting the different characteristics of low‐level and high‐level features, which leads to a small model size and faster inference speed. Extensive experiments on four well‐known datasets demonstrate the effectiveness of our method. For instance, L4Net achieves 71.68% mAP on KITTI test set, with 13.7 M model size at the speed of 149 FPS on NVIDIA TX and 30.7 FPS on Qualcomm‐based device, respectively, which is 4x smaller and 2x faster than baseline model. |
first_indexed | 2024-04-12T17:37:40Z |
format | Article |
id | doaj.art-94d0837e67fc4646bf379fa45538397e |
institution | Directory Open Access Journal |
issn | 1751-9632 1751-9640 |
language | English |
last_indexed | 2024-04-12T17:37:40Z |
publishDate | 2021-02-01 |
publisher | Wiley |
record_format | Article |
series | IET Computer Vision |
spelling | doaj.art-94d0837e67fc4646bf379fa45538397e2022-12-22T03:22:54ZengWileyIET Computer Vision1751-96321751-96402021-02-01151364610.1049/cvi2.12015L4Net: An anchor‐free generic object detector with attention mechanism for autonomous drivingYanan Wu0Songhe Feng1Xiankai Huang2Zizhang Wu3School of Computer and Information Technology Beijing Jiaotong University Beijing ChinaSchool of Computer and Information Technology Beijing Jiaotong University Beijing ChinaBeijing Technology and Business University Beijing ChinaZongMu Tech Shanghai ChinaAbstract Generic object detection is a crucial task for autonomous driving. To devise a safe and efficient object detector, the following aspects are required to be considered: high accuracy, real‐time inference speed and small model size. Herein, a simple yet effective anchor‐free object detector named L4Net is proposed, which incorporates a keypoint detection backbone and a co‐attention scheme into a unified framework, and achieves lower computation cost with higher detection accuracy than prior art across a wide spectrum of resource constrains. Specifically, the backbone utilizes Multi‐scale Receptive‐fields Enhancement module (MRE) to capture context‐wise information, where the features of object scale and shape invariance are simultaneously considered. The co‐attention scheme integrates the strength of both Class‐agnostic Attention (CA) and Semantic Attention (SA), and explores the valuable features from low‐level to high‐level to generate more accurate prediction boxes. Compared with previous feature fusion strategy, multi‐scale features are selectively integrated by fully exploiting the different characteristics of low‐level and high‐level features, which leads to a small model size and faster inference speed. Extensive experiments on four well‐known datasets demonstrate the effectiveness of our method. For instance, L4Net achieves 71.68% mAP on KITTI test set, with 13.7 M model size at the speed of 149 FPS on NVIDIA TX and 30.7 FPS on Qualcomm‐based device, respectively, which is 4x smaller and 2x faster than baseline model.https://doi.org/10.1049/cvi2.12015 |
spellingShingle | Yanan Wu Songhe Feng Xiankai Huang Zizhang Wu L4Net: An anchor‐free generic object detector with attention mechanism for autonomous driving IET Computer Vision |
title | L4Net: An anchor‐free generic object detector with attention mechanism for autonomous driving |
title_full | L4Net: An anchor‐free generic object detector with attention mechanism for autonomous driving |
title_fullStr | L4Net: An anchor‐free generic object detector with attention mechanism for autonomous driving |
title_full_unstemmed | L4Net: An anchor‐free generic object detector with attention mechanism for autonomous driving |
title_short | L4Net: An anchor‐free generic object detector with attention mechanism for autonomous driving |
title_sort | l4net an anchor free generic object detector with attention mechanism for autonomous driving |
url | https://doi.org/10.1049/cvi2.12015 |
work_keys_str_mv | AT yananwu l4netananchorfreegenericobjectdetectorwithattentionmechanismforautonomousdriving AT songhefeng l4netananchorfreegenericobjectdetectorwithattentionmechanismforautonomousdriving AT xiankaihuang l4netananchorfreegenericobjectdetectorwithattentionmechanismforautonomousdriving AT zizhangwu l4netananchorfreegenericobjectdetectorwithattentionmechanismforautonomousdriving |