MCF3D: Multi-Stage Complementary Fusion for Multi-Sensor 3D Object Detection
We present MCF3D, a multi-stage complementary fusion three-dimensional (3D) object detection network for autonomous driving, robot navigation, and virtual reality. This is an end-to-end learnable architecture, which takes both LIDAR point clouds and RGB images as inputs and utilizes a 3D region prop...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2019-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8756006/ |
_version_ | 1818557532900163584 |
---|---|
author | Jiarong Wang Ming Zhu Deyao Sun Bo Wang Wen Gao Hua Wei |
author_facet | Jiarong Wang Ming Zhu Deyao Sun Bo Wang Wen Gao Hua Wei |
author_sort | Jiarong Wang |
collection | DOAJ |
description | We present MCF3D, a multi-stage complementary fusion three-dimensional (3D) object detection network for autonomous driving, robot navigation, and virtual reality. This is an end-to-end learnable architecture, which takes both LIDAR point clouds and RGB images as inputs and utilizes a 3D region proposal subnet and second stage detector(s) subnet to achieve high-precision oriented 3D bounding box prediction. To fully exploit the strength of multimodal information, we design a series of fine and targeted fusion methods based on the attention mechanism and prior knowledge, including “pre-fusion,” “anchor-fusion,” and “proposal-fusion.” Our proposed RGB-Intensity form encodes the reflection intensity onto the input image to strengthen the representational power. Our designed proposal-element attention module allows the network to be guided to focus more on efficient and critical information with negligible overheads. In addition, we propose a cascade-enhanced detector for small classes, which is more selective against close false positives. The experiments on the challenging KITTI benchmark show that our MCF3D method produces state-of-the-art results while running in near real-time with a low memory footprint. |
first_indexed | 2024-12-14T00:00:45Z |
format | Article |
id | doaj.art-c29c69d641d14cb5b210ce51e9a8ac2f |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-14T00:00:45Z |
publishDate | 2019-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-c29c69d641d14cb5b210ce51e9a8ac2f2022-12-21T23:26:21ZengIEEEIEEE Access2169-35362019-01-017908019081410.1109/ACCESS.2019.29270128756006MCF3D: Multi-Stage Complementary Fusion for Multi-Sensor 3D Object DetectionJiarong Wang0https://orcid.org/0000-0002-0377-8083Ming Zhu1Deyao Sun2Bo Wang3Wen Gao4Hua Wei5Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun, ChinaChangchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun, ChinaChangchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun, ChinaChangchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun, ChinaChangchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun, ChinaChangchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun, ChinaWe present MCF3D, a multi-stage complementary fusion three-dimensional (3D) object detection network for autonomous driving, robot navigation, and virtual reality. This is an end-to-end learnable architecture, which takes both LIDAR point clouds and RGB images as inputs and utilizes a 3D region proposal subnet and second stage detector(s) subnet to achieve high-precision oriented 3D bounding box prediction. To fully exploit the strength of multimodal information, we design a series of fine and targeted fusion methods based on the attention mechanism and prior knowledge, including “pre-fusion,” “anchor-fusion,” and “proposal-fusion.” Our proposed RGB-Intensity form encodes the reflection intensity onto the input image to strengthen the representational power. Our designed proposal-element attention module allows the network to be guided to focus more on efficient and critical information with negligible overheads. In addition, we propose a cascade-enhanced detector for small classes, which is more selective against close false positives. The experiments on the challenging KITTI benchmark show that our MCF3D method produces state-of-the-art results while running in near real-time with a low memory footprint.https://ieeexplore.ieee.org/document/8756006/3D object detectionmulti-sensor fusionattention mechanismautonomous driving |
spellingShingle | Jiarong Wang Ming Zhu Deyao Sun Bo Wang Wen Gao Hua Wei MCF3D: Multi-Stage Complementary Fusion for Multi-Sensor 3D Object Detection IEEE Access 3D object detection multi-sensor fusion attention mechanism autonomous driving |
title | MCF3D: Multi-Stage Complementary Fusion for Multi-Sensor 3D Object Detection |
title_full | MCF3D: Multi-Stage Complementary Fusion for Multi-Sensor 3D Object Detection |
title_fullStr | MCF3D: Multi-Stage Complementary Fusion for Multi-Sensor 3D Object Detection |
title_full_unstemmed | MCF3D: Multi-Stage Complementary Fusion for Multi-Sensor 3D Object Detection |
title_short | MCF3D: Multi-Stage Complementary Fusion for Multi-Sensor 3D Object Detection |
title_sort | mcf3d multi stage complementary fusion for multi sensor 3d object detection |
topic | 3D object detection multi-sensor fusion attention mechanism autonomous driving |
url | https://ieeexplore.ieee.org/document/8756006/ |
work_keys_str_mv | AT jiarongwang mcf3dmultistagecomplementaryfusionformultisensor3dobjectdetection AT mingzhu mcf3dmultistagecomplementaryfusionformultisensor3dobjectdetection AT deyaosun mcf3dmultistagecomplementaryfusionformultisensor3dobjectdetection AT bowang mcf3dmultistagecomplementaryfusionformultisensor3dobjectdetection AT wengao mcf3dmultistagecomplementaryfusionformultisensor3dobjectdetection AT huawei mcf3dmultistagecomplementaryfusionformultisensor3dobjectdetection |