CenterTransFuser: radar point cloud and visual information fusion for 3D object detection

Abstract Sensor fusion is an important component of the perception system in autonomous driving, and the fusion of radar point cloud information and camera visual information can improve the perception capability of autonomous vehicles. However, most of the existing studies ignore the extraction of...

Full description

Bibliographic Details
Main Authors: Yan Li, Kai Zeng, Tao Shen
Format: Article
Language:English
Published: SpringerOpen 2023-01-01
Series:EURASIP Journal on Advances in Signal Processing
Subjects:
Online Access:https://doi.org/10.1186/s13634-022-00944-6
_version_ 1828063462111051776
author Yan Li
Kai Zeng
Tao Shen
author_facet Yan Li
Kai Zeng
Tao Shen
author_sort Yan Li
collection DOAJ
description Abstract Sensor fusion is an important component of the perception system in autonomous driving, and the fusion of radar point cloud information and camera visual information can improve the perception capability of autonomous vehicles. However, most of the existing studies ignore the extraction of local neighborhood information and only consider shallow fusion between the two modalities based on the extracted global information, which cannot perform a deep fusion of cross-modal contextual information interaction. Meanwhile, in data preprocessing, the noise in radar data is usually only filtered by the depth information derived from image feature prediction, and such methods affect the accuracy of radar branching to generate regions of interest and cannot effectively filter out irrelevant information of radar points. This paper proposes the CenterTransFuser model that makes full use of millimeter-wave radar point cloud information and visual information to enable cross-modal fusion of the two heterogeneous information. Specifically, a new interaction called cross-transformer is explored, which cooperatively exploits cross-modal cross-multiple attention and joint cross-multiple attention to mine radar and image complementary information. Meanwhile, an adaptive depth thresholding filtering method is designed to reduce the noise of radar modality-independent information projected onto the image. The CenterTransFuser model is evaluated on the challenging nuScenes dataset, and it achieves excellent performance. Particularly, the detection accuracy is significantly improved for pedestrians, motorcycles, and bicycles, showing the superiority and effectiveness of the proposed model.
first_indexed 2024-04-10T22:44:34Z
format Article
id doaj.art-1c01bc9e443c488ea6a78b94e25976a8
institution Directory Open Access Journal
issn 1687-6180
language English
last_indexed 2024-04-10T22:44:34Z
publishDate 2023-01-01
publisher SpringerOpen
record_format Article
series EURASIP Journal on Advances in Signal Processing
spelling doaj.art-1c01bc9e443c488ea6a78b94e25976a82023-01-15T12:24:18ZengSpringerOpenEURASIP Journal on Advances in Signal Processing1687-61802023-01-012023112310.1186/s13634-022-00944-6CenterTransFuser: radar point cloud and visual information fusion for 3D object detectionYan Li0Kai Zeng1Tao Shen2School of Information Engineering and Automation, Kunming University of Science and TechnologySchool of Information Engineering and Automation, Kunming University of Science and TechnologySchool of Information Engineering and Automation, Kunming University of Science and TechnologyAbstract Sensor fusion is an important component of the perception system in autonomous driving, and the fusion of radar point cloud information and camera visual information can improve the perception capability of autonomous vehicles. However, most of the existing studies ignore the extraction of local neighborhood information and only consider shallow fusion between the two modalities based on the extracted global information, which cannot perform a deep fusion of cross-modal contextual information interaction. Meanwhile, in data preprocessing, the noise in radar data is usually only filtered by the depth information derived from image feature prediction, and such methods affect the accuracy of radar branching to generate regions of interest and cannot effectively filter out irrelevant information of radar points. This paper proposes the CenterTransFuser model that makes full use of millimeter-wave radar point cloud information and visual information to enable cross-modal fusion of the two heterogeneous information. Specifically, a new interaction called cross-transformer is explored, which cooperatively exploits cross-modal cross-multiple attention and joint cross-multiple attention to mine radar and image complementary information. Meanwhile, an adaptive depth thresholding filtering method is designed to reduce the noise of radar modality-independent information projected onto the image. The CenterTransFuser model is evaluated on the challenging nuScenes dataset, and it achieves excellent performance. Particularly, the detection accuracy is significantly improved for pedestrians, motorcycles, and bicycles, showing the superiority and effectiveness of the proposed model.https://doi.org/10.1186/s13634-022-00944-6Cross-transformerDepth threshold filtering3D detectionCross-modal fusionContextual interaction
spellingShingle Yan Li
Kai Zeng
Tao Shen
CenterTransFuser: radar point cloud and visual information fusion for 3D object detection
EURASIP Journal on Advances in Signal Processing
Cross-transformer
Depth threshold filtering
3D detection
Cross-modal fusion
Contextual interaction
title CenterTransFuser: radar point cloud and visual information fusion for 3D object detection
title_full CenterTransFuser: radar point cloud and visual information fusion for 3D object detection
title_fullStr CenterTransFuser: radar point cloud and visual information fusion for 3D object detection
title_full_unstemmed CenterTransFuser: radar point cloud and visual information fusion for 3D object detection
title_short CenterTransFuser: radar point cloud and visual information fusion for 3D object detection
title_sort centertransfuser radar point cloud and visual information fusion for 3d object detection
topic Cross-transformer
Depth threshold filtering
3D detection
Cross-modal fusion
Contextual interaction
url https://doi.org/10.1186/s13634-022-00944-6
work_keys_str_mv AT yanli centertransfuserradarpointcloudandvisualinformationfusionfor3dobjectdetection
AT kaizeng centertransfuserradarpointcloudandvisualinformationfusionfor3dobjectdetection
AT taoshen centertransfuserradarpointcloudandvisualinformationfusionfor3dobjectdetection