MCF3D: Multi-Stage Complementary Fusion for Multi-Sensor 3D Object Detection

We present MCF3D, a multi-stage complementary fusion three-dimensional (3D) object detection network for autonomous driving, robot navigation, and virtual reality. This is an end-to-end learnable architecture, which takes both LIDAR point clouds and RGB images as inputs and utilizes a 3D region prop...

Full description

Bibliographic Details
Main Authors: Jiarong Wang, Ming Zhu, Deyao Sun, Bo Wang, Wen Gao, Hua Wei
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8756006/
_version_ 1818557532900163584
author Jiarong Wang
Ming Zhu
Deyao Sun
Bo Wang
Wen Gao
Hua Wei
author_facet Jiarong Wang
Ming Zhu
Deyao Sun
Bo Wang
Wen Gao
Hua Wei
author_sort Jiarong Wang
collection DOAJ
description We present MCF3D, a multi-stage complementary fusion three-dimensional (3D) object detection network for autonomous driving, robot navigation, and virtual reality. This is an end-to-end learnable architecture, which takes both LIDAR point clouds and RGB images as inputs and utilizes a 3D region proposal subnet and second stage detector(s) subnet to achieve high-precision oriented 3D bounding box prediction. To fully exploit the strength of multimodal information, we design a series of fine and targeted fusion methods based on the attention mechanism and prior knowledge, including “pre-fusion,” “anchor-fusion,” and “proposal-fusion.” Our proposed RGB-Intensity form encodes the reflection intensity onto the input image to strengthen the representational power. Our designed proposal-element attention module allows the network to be guided to focus more on efficient and critical information with negligible overheads. In addition, we propose a cascade-enhanced detector for small classes, which is more selective against close false positives. The experiments on the challenging KITTI benchmark show that our MCF3D method produces state-of-the-art results while running in near real-time with a low memory footprint.
first_indexed 2024-12-14T00:00:45Z
format Article
id doaj.art-c29c69d641d14cb5b210ce51e9a8ac2f
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-14T00:00:45Z
publishDate 2019-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-c29c69d641d14cb5b210ce51e9a8ac2f2022-12-21T23:26:21ZengIEEEIEEE Access2169-35362019-01-017908019081410.1109/ACCESS.2019.29270128756006MCF3D: Multi-Stage Complementary Fusion for Multi-Sensor 3D Object DetectionJiarong Wang0https://orcid.org/0000-0002-0377-8083Ming Zhu1Deyao Sun2Bo Wang3Wen Gao4Hua Wei5Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun, ChinaChangchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun, ChinaChangchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun, ChinaChangchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun, ChinaChangchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun, ChinaChangchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun, ChinaWe present MCF3D, a multi-stage complementary fusion three-dimensional (3D) object detection network for autonomous driving, robot navigation, and virtual reality. This is an end-to-end learnable architecture, which takes both LIDAR point clouds and RGB images as inputs and utilizes a 3D region proposal subnet and second stage detector(s) subnet to achieve high-precision oriented 3D bounding box prediction. To fully exploit the strength of multimodal information, we design a series of fine and targeted fusion methods based on the attention mechanism and prior knowledge, including “pre-fusion,” “anchor-fusion,” and “proposal-fusion.” Our proposed RGB-Intensity form encodes the reflection intensity onto the input image to strengthen the representational power. Our designed proposal-element attention module allows the network to be guided to focus more on efficient and critical information with negligible overheads. In addition, we propose a cascade-enhanced detector for small classes, which is more selective against close false positives. The experiments on the challenging KITTI benchmark show that our MCF3D method produces state-of-the-art results while running in near real-time with a low memory footprint.https://ieeexplore.ieee.org/document/8756006/3D object detectionmulti-sensor fusionattention mechanismautonomous driving
spellingShingle Jiarong Wang
Ming Zhu
Deyao Sun
Bo Wang
Wen Gao
Hua Wei
MCF3D: Multi-Stage Complementary Fusion for Multi-Sensor 3D Object Detection
IEEE Access
3D object detection
multi-sensor fusion
attention mechanism
autonomous driving
title MCF3D: Multi-Stage Complementary Fusion for Multi-Sensor 3D Object Detection
title_full MCF3D: Multi-Stage Complementary Fusion for Multi-Sensor 3D Object Detection
title_fullStr MCF3D: Multi-Stage Complementary Fusion for Multi-Sensor 3D Object Detection
title_full_unstemmed MCF3D: Multi-Stage Complementary Fusion for Multi-Sensor 3D Object Detection
title_short MCF3D: Multi-Stage Complementary Fusion for Multi-Sensor 3D Object Detection
title_sort mcf3d multi stage complementary fusion for multi sensor 3d object detection
topic 3D object detection
multi-sensor fusion
attention mechanism
autonomous driving
url https://ieeexplore.ieee.org/document/8756006/
work_keys_str_mv AT jiarongwang mcf3dmultistagecomplementaryfusionformultisensor3dobjectdetection
AT mingzhu mcf3dmultistagecomplementaryfusionformultisensor3dobjectdetection
AT deyaosun mcf3dmultistagecomplementaryfusionformultisensor3dobjectdetection
AT bowang mcf3dmultistagecomplementaryfusionformultisensor3dobjectdetection
AT wengao mcf3dmultistagecomplementaryfusionformultisensor3dobjectdetection
AT huawei mcf3dmultistagecomplementaryfusionformultisensor3dobjectdetection