RTL3D: real‐time LIDAR‐based 3D object detection with sparse CNN
LIDAR (light detection and ranging) based real‐time 3D perception is crucial for applications such as autonomous driving. However, most of the convolutional neural network (CNN) based methods are time‐consuming and computation‐intensive. These drawbacks are mainly attributed to the highly variable d...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2020-08-01
|
Series: | IET Computer Vision |
Subjects: | |
Online Access: | https://doi.org/10.1049/iet-cvi.2019.0508 |
_version_ | 1797684736425984000 |
---|---|
author | Lin Yan Kai Liu Evgeny Belyaev Meiyu Duan |
author_facet | Lin Yan Kai Liu Evgeny Belyaev Meiyu Duan |
author_sort | Lin Yan |
collection | DOAJ |
description | LIDAR (light detection and ranging) based real‐time 3D perception is crucial for applications such as autonomous driving. However, most of the convolutional neural network (CNN) based methods are time‐consuming and computation‐intensive. These drawbacks are mainly attributed to the highly variable density of LIDAR point cloud and the complexity of their pipelines. To find a balance between speed and accuracy for 3D object detection from LIDAR, authors propose RTL3D, a computationally efficient Real‐time LIDAR‐based 3D detector. In RTL3D, an effective voxel‐wise feature representation is utilised to organise unstructured point cloud. By employing a sparse feature learning network (SFLN) on voxelised 3D data, RTL3D exploits the sparsity of point cloud and down‐samples 3D data into 2D. Basing on the generated 2D feature map, an optimised dense detection network (DDN) is applied to regress the oriented bounding box without relying on any predefined anchor boxes. The authors also introduce an incremental data augmentation approach which greatly improves the performance of RTL3D. Empirical experiments on public KITTI benchmark demonstrate that RTL3D achieves a competitive performance with state‐of‐the‐art works on 3D detection task. Owning to the simplicity of its single‐stage and anchor‐free design, RTL3D has a real‐time inference speed of 40 FPS. |
first_indexed | 2024-03-12T00:34:06Z |
format | Article |
id | doaj.art-5d88358e014e4136bebf9bdac18c5531 |
institution | Directory Open Access Journal |
issn | 1751-9632 1751-9640 |
language | English |
last_indexed | 2024-03-12T00:34:06Z |
publishDate | 2020-08-01 |
publisher | Wiley |
record_format | Article |
series | IET Computer Vision |
spelling | doaj.art-5d88358e014e4136bebf9bdac18c55312023-09-15T10:06:16ZengWileyIET Computer Vision1751-96321751-96402020-08-0114522423210.1049/iet-cvi.2019.0508RTL3D: real‐time LIDAR‐based 3D object detection with sparse CNNLin Yan0Kai Liu1Evgeny Belyaev2Meiyu Duan3School of Computer Science, Xidian University2 South Taibai RoadXi'anShaanxiPeople's Republic of ChinaSchool of Computer Science, Xidian University2 South Taibai RoadXi'anShaanxiPeople's Republic of ChinaITMO UniversitySt PetersburgRussiaSchool of Computer Science, Xidian University2 South Taibai RoadXi'anShaanxiPeople's Republic of ChinaLIDAR (light detection and ranging) based real‐time 3D perception is crucial for applications such as autonomous driving. However, most of the convolutional neural network (CNN) based methods are time‐consuming and computation‐intensive. These drawbacks are mainly attributed to the highly variable density of LIDAR point cloud and the complexity of their pipelines. To find a balance between speed and accuracy for 3D object detection from LIDAR, authors propose RTL3D, a computationally efficient Real‐time LIDAR‐based 3D detector. In RTL3D, an effective voxel‐wise feature representation is utilised to organise unstructured point cloud. By employing a sparse feature learning network (SFLN) on voxelised 3D data, RTL3D exploits the sparsity of point cloud and down‐samples 3D data into 2D. Basing on the generated 2D feature map, an optimised dense detection network (DDN) is applied to regress the oriented bounding box without relying on any predefined anchor boxes. The authors also introduce an incremental data augmentation approach which greatly improves the performance of RTL3D. Empirical experiments on public KITTI benchmark demonstrate that RTL3D achieves a competitive performance with state‐of‐the‐art works on 3D detection task. Owning to the simplicity of its single‐stage and anchor‐free design, RTL3D has a real‐time inference speed of 40 FPS.https://doi.org/10.1049/iet-cvi.2019.0508real-time LIDAR-based 3D object detectionconvolutional neural networkLIDAR point cloudreal-time LIDAR-based 3D detectorsparse feature learning networkvoxelised 3D data |
spellingShingle | Lin Yan Kai Liu Evgeny Belyaev Meiyu Duan RTL3D: real‐time LIDAR‐based 3D object detection with sparse CNN IET Computer Vision real-time LIDAR-based 3D object detection convolutional neural network LIDAR point cloud real-time LIDAR-based 3D detector sparse feature learning network voxelised 3D data |
title | RTL3D: real‐time LIDAR‐based 3D object detection with sparse CNN |
title_full | RTL3D: real‐time LIDAR‐based 3D object detection with sparse CNN |
title_fullStr | RTL3D: real‐time LIDAR‐based 3D object detection with sparse CNN |
title_full_unstemmed | RTL3D: real‐time LIDAR‐based 3D object detection with sparse CNN |
title_short | RTL3D: real‐time LIDAR‐based 3D object detection with sparse CNN |
title_sort | rtl3d real time lidar based 3d object detection with sparse cnn |
topic | real-time LIDAR-based 3D object detection convolutional neural network LIDAR point cloud real-time LIDAR-based 3D detector sparse feature learning network voxelised 3D data |
url | https://doi.org/10.1049/iet-cvi.2019.0508 |
work_keys_str_mv | AT linyan rtl3drealtimelidarbased3dobjectdetectionwithsparsecnn AT kailiu rtl3drealtimelidarbased3dobjectdetectionwithsparsecnn AT evgenybelyaev rtl3drealtimelidarbased3dobjectdetectionwithsparsecnn AT meiyuduan rtl3drealtimelidarbased3dobjectdetectionwithsparsecnn |