Muti-Frame Point Cloud Feature Fusion Based on Attention Mechanisms for 3D Object Detection
Continuous frames of point-cloud-based object detection is a new research direction. Currently, most research studies fuse multi-frame point clouds using concatenation-based methods. The method aligns different frames by using information on GPS, IMU, etc. However, this fusion method can only align...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-10-01
|
Series: | Sensors |
Subjects: | |
Online Access: | https://www.mdpi.com/1424-8220/22/19/7473 |
_version_ | 1827653007761735680 |
---|---|
author | Zhenyu Zhai Qiantong Wang Zongxu Pan Zhentong Gao Wenlong Hu |
author_facet | Zhenyu Zhai Qiantong Wang Zongxu Pan Zhentong Gao Wenlong Hu |
author_sort | Zhenyu Zhai |
collection | DOAJ |
description | Continuous frames of point-cloud-based object detection is a new research direction. Currently, most research studies fuse multi-frame point clouds using concatenation-based methods. The method aligns different frames by using information on GPS, IMU, etc. However, this fusion method can only align static objects and not moving objects. In this paper, we proposed a non-local-based multi-scale feature fusion method, which can handle both moving and static objects without GPS- and IMU-based registrations. Considering that non-local methods are resource-consuming, we proposed a novel simplified non-local block based on the sparsity of the point cloud. By filtering out empty units, memory consumption decreased by 99.93%. In addition, triple attention is adopted to enhance the key information on the object and suppresses background noise, further benefiting non-local-based feature fusion methods. Finally, we verify the method based on PointPillars and CenterPoint. Experimental results show that the mAP of the proposed method improved by 3.9% and 4.1% in mAP compared with concatenation-based fusion modules, PointPillars-2 and CenterPoint-2, respectively. In addition, the proposed network outperforms powerful 3D-VID by 1.2% in mAP. |
first_indexed | 2024-03-09T21:10:04Z |
format | Article |
id | doaj.art-5b55a1f60c7a4ec9aacd60adcf87a8ca |
institution | Directory Open Access Journal |
issn | 1424-8220 |
language | English |
last_indexed | 2024-03-09T21:10:04Z |
publishDate | 2022-10-01 |
publisher | MDPI AG |
record_format | Article |
series | Sensors |
spelling | doaj.art-5b55a1f60c7a4ec9aacd60adcf87a8ca2023-11-23T21:49:45ZengMDPI AGSensors1424-82202022-10-012219747310.3390/s22197473Muti-Frame Point Cloud Feature Fusion Based on Attention Mechanisms for 3D Object DetectionZhenyu Zhai0Qiantong Wang1Zongxu Pan2Zhentong Gao3Wenlong Hu4Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, ChinaContinuous frames of point-cloud-based object detection is a new research direction. Currently, most research studies fuse multi-frame point clouds using concatenation-based methods. The method aligns different frames by using information on GPS, IMU, etc. However, this fusion method can only align static objects and not moving objects. In this paper, we proposed a non-local-based multi-scale feature fusion method, which can handle both moving and static objects without GPS- and IMU-based registrations. Considering that non-local methods are resource-consuming, we proposed a novel simplified non-local block based on the sparsity of the point cloud. By filtering out empty units, memory consumption decreased by 99.93%. In addition, triple attention is adopted to enhance the key information on the object and suppresses background noise, further benefiting non-local-based feature fusion methods. Finally, we verify the method based on PointPillars and CenterPoint. Experimental results show that the mAP of the proposed method improved by 3.9% and 4.1% in mAP compared with concatenation-based fusion modules, PointPillars-2 and CenterPoint-2, respectively. In addition, the proposed network outperforms powerful 3D-VID by 1.2% in mAP.https://www.mdpi.com/1424-8220/22/19/7473autonomous driving3D object detectionpoint cloud sequencesattention mechanismfeature fusion |
spellingShingle | Zhenyu Zhai Qiantong Wang Zongxu Pan Zhentong Gao Wenlong Hu Muti-Frame Point Cloud Feature Fusion Based on Attention Mechanisms for 3D Object Detection Sensors autonomous driving 3D object detection point cloud sequences attention mechanism feature fusion |
title | Muti-Frame Point Cloud Feature Fusion Based on Attention Mechanisms for 3D Object Detection |
title_full | Muti-Frame Point Cloud Feature Fusion Based on Attention Mechanisms for 3D Object Detection |
title_fullStr | Muti-Frame Point Cloud Feature Fusion Based on Attention Mechanisms for 3D Object Detection |
title_full_unstemmed | Muti-Frame Point Cloud Feature Fusion Based on Attention Mechanisms for 3D Object Detection |
title_short | Muti-Frame Point Cloud Feature Fusion Based on Attention Mechanisms for 3D Object Detection |
title_sort | muti frame point cloud feature fusion based on attention mechanisms for 3d object detection |
topic | autonomous driving 3D object detection point cloud sequences attention mechanism feature fusion |
url | https://www.mdpi.com/1424-8220/22/19/7473 |
work_keys_str_mv | AT zhenyuzhai mutiframepointcloudfeaturefusionbasedonattentionmechanismsfor3dobjectdetection AT qiantongwang mutiframepointcloudfeaturefusionbasedonattentionmechanismsfor3dobjectdetection AT zongxupan mutiframepointcloudfeaturefusionbasedonattentionmechanismsfor3dobjectdetection AT zhentonggao mutiframepointcloudfeaturefusionbasedonattentionmechanismsfor3dobjectdetection AT wenlonghu mutiframepointcloudfeaturefusionbasedonattentionmechanismsfor3dobjectdetection |