Muti-Frame Point Cloud Feature Fusion Based on Attention Mechanisms for 3D Object Detection

Continuous frames of point-cloud-based object detection is a new research direction. Currently, most research studies fuse multi-frame point clouds using concatenation-based methods. The method aligns different frames by using information on GPS, IMU, etc. However, this fusion method can only align...

Full description

Bibliographic Details
Main Authors: Zhenyu Zhai, Qiantong Wang, Zongxu Pan, Zhentong Gao, Wenlong Hu
Format: Article
Language:English
Published: MDPI AG 2022-10-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/22/19/7473
_version_ 1827653007761735680
author Zhenyu Zhai
Qiantong Wang
Zongxu Pan
Zhentong Gao
Wenlong Hu
author_facet Zhenyu Zhai
Qiantong Wang
Zongxu Pan
Zhentong Gao
Wenlong Hu
author_sort Zhenyu Zhai
collection DOAJ
description Continuous frames of point-cloud-based object detection is a new research direction. Currently, most research studies fuse multi-frame point clouds using concatenation-based methods. The method aligns different frames by using information on GPS, IMU, etc. However, this fusion method can only align static objects and not moving objects. In this paper, we proposed a non-local-based multi-scale feature fusion method, which can handle both moving and static objects without GPS- and IMU-based registrations. Considering that non-local methods are resource-consuming, we proposed a novel simplified non-local block based on the sparsity of the point cloud. By filtering out empty units, memory consumption decreased by 99.93%. In addition, triple attention is adopted to enhance the key information on the object and suppresses background noise, further benefiting non-local-based feature fusion methods. Finally, we verify the method based on PointPillars and CenterPoint. Experimental results show that the mAP of the proposed method improved by 3.9% and 4.1% in mAP compared with concatenation-based fusion modules, PointPillars-2 and CenterPoint-2, respectively. In addition, the proposed network outperforms powerful 3D-VID by 1.2% in mAP.
first_indexed 2024-03-09T21:10:04Z
format Article
id doaj.art-5b55a1f60c7a4ec9aacd60adcf87a8ca
institution Directory Open Access Journal
issn 1424-8220
language English
last_indexed 2024-03-09T21:10:04Z
publishDate 2022-10-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj.art-5b55a1f60c7a4ec9aacd60adcf87a8ca2023-11-23T21:49:45ZengMDPI AGSensors1424-82202022-10-012219747310.3390/s22197473Muti-Frame Point Cloud Feature Fusion Based on Attention Mechanisms for 3D Object DetectionZhenyu Zhai0Qiantong Wang1Zongxu Pan2Zhentong Gao3Wenlong Hu4Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, ChinaContinuous frames of point-cloud-based object detection is a new research direction. Currently, most research studies fuse multi-frame point clouds using concatenation-based methods. The method aligns different frames by using information on GPS, IMU, etc. However, this fusion method can only align static objects and not moving objects. In this paper, we proposed a non-local-based multi-scale feature fusion method, which can handle both moving and static objects without GPS- and IMU-based registrations. Considering that non-local methods are resource-consuming, we proposed a novel simplified non-local block based on the sparsity of the point cloud. By filtering out empty units, memory consumption decreased by 99.93%. In addition, triple attention is adopted to enhance the key information on the object and suppresses background noise, further benefiting non-local-based feature fusion methods. Finally, we verify the method based on PointPillars and CenterPoint. Experimental results show that the mAP of the proposed method improved by 3.9% and 4.1% in mAP compared with concatenation-based fusion modules, PointPillars-2 and CenterPoint-2, respectively. In addition, the proposed network outperforms powerful 3D-VID by 1.2% in mAP.https://www.mdpi.com/1424-8220/22/19/7473autonomous driving3D object detectionpoint cloud sequencesattention mechanismfeature fusion
spellingShingle Zhenyu Zhai
Qiantong Wang
Zongxu Pan
Zhentong Gao
Wenlong Hu
Muti-Frame Point Cloud Feature Fusion Based on Attention Mechanisms for 3D Object Detection
Sensors
autonomous driving
3D object detection
point cloud sequences
attention mechanism
feature fusion
title Muti-Frame Point Cloud Feature Fusion Based on Attention Mechanisms for 3D Object Detection
title_full Muti-Frame Point Cloud Feature Fusion Based on Attention Mechanisms for 3D Object Detection
title_fullStr Muti-Frame Point Cloud Feature Fusion Based on Attention Mechanisms for 3D Object Detection
title_full_unstemmed Muti-Frame Point Cloud Feature Fusion Based on Attention Mechanisms for 3D Object Detection
title_short Muti-Frame Point Cloud Feature Fusion Based on Attention Mechanisms for 3D Object Detection
title_sort muti frame point cloud feature fusion based on attention mechanisms for 3d object detection
topic autonomous driving
3D object detection
point cloud sequences
attention mechanism
feature fusion
url https://www.mdpi.com/1424-8220/22/19/7473
work_keys_str_mv AT zhenyuzhai mutiframepointcloudfeaturefusionbasedonattentionmechanismsfor3dobjectdetection
AT qiantongwang mutiframepointcloudfeaturefusionbasedonattentionmechanismsfor3dobjectdetection
AT zongxupan mutiframepointcloudfeaturefusionbasedonattentionmechanismsfor3dobjectdetection
AT zhentonggao mutiframepointcloudfeaturefusionbasedonattentionmechanismsfor3dobjectdetection
AT wenlonghu mutiframepointcloudfeaturefusionbasedonattentionmechanismsfor3dobjectdetection