CenterTransFuser: radar point cloud and visual information fusion for 3D object detection

Abstract Sensor fusion is an important component of the perception system in autonomous driving, and the fusion of radar point cloud information and camera visual information can improve the perception capability of autonomous vehicles. However, most of the existing studies ignore the extraction of...

Full description

Bibliographic Details
Main Authors:	Yan Li, Kai Zeng, Tao Shen
Format:	Article
Language:	English
Published:	SpringerOpen 2023-01-01
Series:	EURASIP Journal on Advances in Signal Processing
Subjects:	Cross-transformer Depth threshold filtering 3D detection Cross-modal fusion Contextual interaction
Online Access:	https://doi.org/10.1186/s13634-022-00944-6

_version_	1828063462111051776
author	Yan Li Kai Zeng Tao Shen
author_facet	Yan Li Kai Zeng Tao Shen
author_sort	Yan Li
collection	DOAJ
description	Abstract Sensor fusion is an important component of the perception system in autonomous driving, and the fusion of radar point cloud information and camera visual information can improve the perception capability of autonomous vehicles. However, most of the existing studies ignore the extraction of local neighborhood information and only consider shallow fusion between the two modalities based on the extracted global information, which cannot perform a deep fusion of cross-modal contextual information interaction. Meanwhile, in data preprocessing, the noise in radar data is usually only filtered by the depth information derived from image feature prediction, and such methods affect the accuracy of radar branching to generate regions of interest and cannot effectively filter out irrelevant information of radar points. This paper proposes the CenterTransFuser model that makes full use of millimeter-wave radar point cloud information and visual information to enable cross-modal fusion of the two heterogeneous information. Specifically, a new interaction called cross-transformer is explored, which cooperatively exploits cross-modal cross-multiple attention and joint cross-multiple attention to mine radar and image complementary information. Meanwhile, an adaptive depth thresholding filtering method is designed to reduce the noise of radar modality-independent information projected onto the image. The CenterTransFuser model is evaluated on the challenging nuScenes dataset, and it achieves excellent performance. Particularly, the detection accuracy is significantly improved for pedestrians, motorcycles, and bicycles, showing the superiority and effectiveness of the proposed model.
first_indexed	2024-04-10T22:44:34Z
format	Article
id	doaj.art-1c01bc9e443c488ea6a78b94e25976a8
institution	Directory Open Access Journal
issn	1687-6180
language	English
last_indexed	2024-04-10T22:44:34Z
publishDate	2023-01-01
publisher	SpringerOpen
record_format	Article
series	EURASIP Journal on Advances in Signal Processing
spelling	doaj.art-1c01bc9e443c488ea6a78b94e25976a82023-01-15T12:24:18ZengSpringerOpenEURASIP Journal on Advances in Signal Processing1687-61802023-01-012023112310.1186/s13634-022-00944-6CenterTransFuser: radar point cloud and visual information fusion for 3D object detectionYan Li0Kai Zeng1Tao Shen2School of Information Engineering and Automation, Kunming University of Science and TechnologySchool of Information Engineering and Automation, Kunming University of Science and TechnologySchool of Information Engineering and Automation, Kunming University of Science and TechnologyAbstract Sensor fusion is an important component of the perception system in autonomous driving, and the fusion of radar point cloud information and camera visual information can improve the perception capability of autonomous vehicles. However, most of the existing studies ignore the extraction of local neighborhood information and only consider shallow fusion between the two modalities based on the extracted global information, which cannot perform a deep fusion of cross-modal contextual information interaction. Meanwhile, in data preprocessing, the noise in radar data is usually only filtered by the depth information derived from image feature prediction, and such methods affect the accuracy of radar branching to generate regions of interest and cannot effectively filter out irrelevant information of radar points. This paper proposes the CenterTransFuser model that makes full use of millimeter-wave radar point cloud information and visual information to enable cross-modal fusion of the two heterogeneous information. Specifically, a new interaction called cross-transformer is explored, which cooperatively exploits cross-modal cross-multiple attention and joint cross-multiple attention to mine radar and image complementary information. Meanwhile, an adaptive depth thresholding filtering method is designed to reduce the noise of radar modality-independent information projected onto the image. The CenterTransFuser model is evaluated on the challenging nuScenes dataset, and it achieves excellent performance. Particularly, the detection accuracy is significantly improved for pedestrians, motorcycles, and bicycles, showing the superiority and effectiveness of the proposed model.https://doi.org/10.1186/s13634-022-00944-6Cross-transformerDepth threshold filtering3D detectionCross-modal fusionContextual interaction
spellingShingle	Yan Li Kai Zeng Tao Shen CenterTransFuser: radar point cloud and visual information fusion for 3D object detection EURASIP Journal on Advances in Signal Processing Cross-transformer Depth threshold filtering 3D detection Cross-modal fusion Contextual interaction
title	CenterTransFuser: radar point cloud and visual information fusion for 3D object detection
title_full	CenterTransFuser: radar point cloud and visual information fusion for 3D object detection
title_fullStr	CenterTransFuser: radar point cloud and visual information fusion for 3D object detection
title_full_unstemmed	CenterTransFuser: radar point cloud and visual information fusion for 3D object detection
title_short	CenterTransFuser: radar point cloud and visual information fusion for 3D object detection
title_sort	centertransfuser radar point cloud and visual information fusion for 3d object detection
topic	Cross-transformer Depth threshold filtering 3D detection Cross-modal fusion Contextual interaction
url	https://doi.org/10.1186/s13634-022-00944-6
work_keys_str_mv	AT yanli centertransfuserradarpointcloudandvisualinformationfusionfor3dobjectdetection AT kaizeng centertransfuserradarpointcloudandvisualinformationfusionfor3dobjectdetection AT taoshen centertransfuserradarpointcloudandvisualinformationfusionfor3dobjectdetection

CenterTransFuser: radar point cloud and visual information fusion for 3D object detection

Similar Items