RTL3D: real‐time LIDAR‐based 3D object detection with sparse CNN

LIDAR (light detection and ranging) based real‐time 3D perception is crucial for applications such as autonomous driving. However, most of the convolutional neural network (CNN) based methods are time‐consuming and computation‐intensive. These drawbacks are mainly attributed to the highly variable d...

Full description

Bibliographic Details
Main Authors: Lin Yan, Kai Liu, Evgeny Belyaev, Meiyu Duan
Format: Article
Language:English
Published: Wiley 2020-08-01
Series:IET Computer Vision
Subjects:
Online Access:https://doi.org/10.1049/iet-cvi.2019.0508
_version_ 1797684736425984000
author Lin Yan
Kai Liu
Evgeny Belyaev
Meiyu Duan
author_facet Lin Yan
Kai Liu
Evgeny Belyaev
Meiyu Duan
author_sort Lin Yan
collection DOAJ
description LIDAR (light detection and ranging) based real‐time 3D perception is crucial for applications such as autonomous driving. However, most of the convolutional neural network (CNN) based methods are time‐consuming and computation‐intensive. These drawbacks are mainly attributed to the highly variable density of LIDAR point cloud and the complexity of their pipelines. To find a balance between speed and accuracy for 3D object detection from LIDAR, authors propose RTL3D, a computationally efficient Real‐time LIDAR‐based 3D detector. In RTL3D, an effective voxel‐wise feature representation is utilised to organise unstructured point cloud. By employing a sparse feature learning network (SFLN) on voxelised 3D data, RTL3D exploits the sparsity of point cloud and down‐samples 3D data into 2D. Basing on the generated 2D feature map, an optimised dense detection network (DDN) is applied to regress the oriented bounding box without relying on any predefined anchor boxes. The authors also introduce an incremental data augmentation approach which greatly improves the performance of RTL3D. Empirical experiments on public KITTI benchmark demonstrate that RTL3D achieves a competitive performance with state‐of‐the‐art works on 3D detection task. Owning to the simplicity of its single‐stage and anchor‐free design, RTL3D has a real‐time inference speed of 40 FPS.
first_indexed 2024-03-12T00:34:06Z
format Article
id doaj.art-5d88358e014e4136bebf9bdac18c5531
institution Directory Open Access Journal
issn 1751-9632
1751-9640
language English
last_indexed 2024-03-12T00:34:06Z
publishDate 2020-08-01
publisher Wiley
record_format Article
series IET Computer Vision
spelling doaj.art-5d88358e014e4136bebf9bdac18c55312023-09-15T10:06:16ZengWileyIET Computer Vision1751-96321751-96402020-08-0114522423210.1049/iet-cvi.2019.0508RTL3D: real‐time LIDAR‐based 3D object detection with sparse CNNLin Yan0Kai Liu1Evgeny Belyaev2Meiyu Duan3School of Computer Science, Xidian University2 South Taibai RoadXi'anShaanxiPeople's Republic of ChinaSchool of Computer Science, Xidian University2 South Taibai RoadXi'anShaanxiPeople's Republic of ChinaITMO UniversitySt PetersburgRussiaSchool of Computer Science, Xidian University2 South Taibai RoadXi'anShaanxiPeople's Republic of ChinaLIDAR (light detection and ranging) based real‐time 3D perception is crucial for applications such as autonomous driving. However, most of the convolutional neural network (CNN) based methods are time‐consuming and computation‐intensive. These drawbacks are mainly attributed to the highly variable density of LIDAR point cloud and the complexity of their pipelines. To find a balance between speed and accuracy for 3D object detection from LIDAR, authors propose RTL3D, a computationally efficient Real‐time LIDAR‐based 3D detector. In RTL3D, an effective voxel‐wise feature representation is utilised to organise unstructured point cloud. By employing a sparse feature learning network (SFLN) on voxelised 3D data, RTL3D exploits the sparsity of point cloud and down‐samples 3D data into 2D. Basing on the generated 2D feature map, an optimised dense detection network (DDN) is applied to regress the oriented bounding box without relying on any predefined anchor boxes. The authors also introduce an incremental data augmentation approach which greatly improves the performance of RTL3D. Empirical experiments on public KITTI benchmark demonstrate that RTL3D achieves a competitive performance with state‐of‐the‐art works on 3D detection task. Owning to the simplicity of its single‐stage and anchor‐free design, RTL3D has a real‐time inference speed of 40 FPS.https://doi.org/10.1049/iet-cvi.2019.0508real-time LIDAR-based 3D object detectionconvolutional neural networkLIDAR point cloudreal-time LIDAR-based 3D detectorsparse feature learning networkvoxelised 3D data
spellingShingle Lin Yan
Kai Liu
Evgeny Belyaev
Meiyu Duan
RTL3D: real‐time LIDAR‐based 3D object detection with sparse CNN
IET Computer Vision
real-time LIDAR-based 3D object detection
convolutional neural network
LIDAR point cloud
real-time LIDAR-based 3D detector
sparse feature learning network
voxelised 3D data
title RTL3D: real‐time LIDAR‐based 3D object detection with sparse CNN
title_full RTL3D: real‐time LIDAR‐based 3D object detection with sparse CNN
title_fullStr RTL3D: real‐time LIDAR‐based 3D object detection with sparse CNN
title_full_unstemmed RTL3D: real‐time LIDAR‐based 3D object detection with sparse CNN
title_short RTL3D: real‐time LIDAR‐based 3D object detection with sparse CNN
title_sort rtl3d real time lidar based 3d object detection with sparse cnn
topic real-time LIDAR-based 3D object detection
convolutional neural network
LIDAR point cloud
real-time LIDAR-based 3D detector
sparse feature learning network
voxelised 3D data
url https://doi.org/10.1049/iet-cvi.2019.0508
work_keys_str_mv AT linyan rtl3drealtimelidarbased3dobjectdetectionwithsparsecnn
AT kailiu rtl3drealtimelidarbased3dobjectdetectionwithsparsecnn
AT evgenybelyaev rtl3drealtimelidarbased3dobjectdetectionwithsparsecnn
AT meiyuduan rtl3drealtimelidarbased3dobjectdetectionwithsparsecnn