Multi-Task Foreground-Aware Network with Depth Completion for Enhanced RGB-D Fusion Object Detection Based on Transformer

Fusing multiple sensor perceptions, specifically LiDAR and camera, is a prevalent method for target recognition in autonomous driving systems. Traditional object detection algorithms are limited by the sparse nature of LiDAR point clouds, resulting in poor fusion performance, especially for detectin...

Full description

Bibliographic Details
Main Authors: Jiasheng Pan, Songyi Zhong, Tao Yue, Yankun Yin, Yanhao Tang
Format: Article
Language:English
Published: MDPI AG 2024-04-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/24/7/2374
_version_ 1797211928332861440
author Jiasheng Pan
Songyi Zhong
Tao Yue
Yankun Yin
Yanhao Tang
author_facet Jiasheng Pan
Songyi Zhong
Tao Yue
Yankun Yin
Yanhao Tang
author_sort Jiasheng Pan
collection DOAJ
description Fusing multiple sensor perceptions, specifically LiDAR and camera, is a prevalent method for target recognition in autonomous driving systems. Traditional object detection algorithms are limited by the sparse nature of LiDAR point clouds, resulting in poor fusion performance, especially for detecting small and distant targets. In this paper, a multi-task parallel neural network based on the Transformer is constructed to simultaneously perform depth completion and object detection. The loss functions are redesigned to reduce environmental noise in depth completion, and a new fusion module is designed to enhance the network’s perception of the foreground and background. The network leverages the correlation between RGB pixels for depth completion, completing the LiDAR point cloud and addressing the mismatch between sparse LiDAR features and dense pixel features. Subsequently, we extract depth map features and effectively fuse them with RGB features, fully utilizing the depth feature differences between foreground and background to enhance object detection performance, especially for challenging targets. Compared to the baseline network, improvements of 4.78%, 8.93%, and 15.54% are achieved in the difficult indicators for cars, pedestrians, and cyclists, respectively. Experimental results also demonstrate that the network achieves a speed of 38 fps, validating the efficiency and feasibility of the proposed method.
first_indexed 2024-04-24T10:34:17Z
format Article
id doaj.art-2395ebe1a9b3469f8203b3e116ce28df
institution Directory Open Access Journal
issn 1424-8220
language English
last_indexed 2024-04-24T10:34:17Z
publishDate 2024-04-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj.art-2395ebe1a9b3469f8203b3e116ce28df2024-04-12T13:26:54ZengMDPI AGSensors1424-82202024-04-01247237410.3390/s24072374Multi-Task Foreground-Aware Network with Depth Completion for Enhanced RGB-D Fusion Object Detection Based on TransformerJiasheng Pan0Songyi Zhong1Tao Yue2Yankun Yin3Yanhao Tang4School of Computer Engineering and Science, Shanghai University, No. 99 Shangda Road, Shanghai 200444, ChinaSchool of Mechatronic Engineering and Automation, Shanghai University, No. 99 Shangda Road, Shanghai 200444, ChinaSchool of Mechatronic Engineering and Automation, Shanghai University, No. 99 Shangda Road, Shanghai 200444, ChinaSchool of Artificial Intelligence, Shanghai University, No. 99 Shangda Road, Shanghai 200444, ChinaSchool of Artificial Intelligence, Shanghai University, No. 99 Shangda Road, Shanghai 200444, ChinaFusing multiple sensor perceptions, specifically LiDAR and camera, is a prevalent method for target recognition in autonomous driving systems. Traditional object detection algorithms are limited by the sparse nature of LiDAR point clouds, resulting in poor fusion performance, especially for detecting small and distant targets. In this paper, a multi-task parallel neural network based on the Transformer is constructed to simultaneously perform depth completion and object detection. The loss functions are redesigned to reduce environmental noise in depth completion, and a new fusion module is designed to enhance the network’s perception of the foreground and background. The network leverages the correlation between RGB pixels for depth completion, completing the LiDAR point cloud and addressing the mismatch between sparse LiDAR features and dense pixel features. Subsequently, we extract depth map features and effectively fuse them with RGB features, fully utilizing the depth feature differences between foreground and background to enhance object detection performance, especially for challenging targets. Compared to the baseline network, improvements of 4.78%, 8.93%, and 15.54% are achieved in the difficult indicators for cars, pedestrians, and cyclists, respectively. Experimental results also demonstrate that the network achieves a speed of 38 fps, validating the efficiency and feasibility of the proposed method.https://www.mdpi.com/1424-8220/24/7/2374point cloud dataYOLOTransformermulti-source feature fusiondepth completion
spellingShingle Jiasheng Pan
Songyi Zhong
Tao Yue
Yankun Yin
Yanhao Tang
Multi-Task Foreground-Aware Network with Depth Completion for Enhanced RGB-D Fusion Object Detection Based on Transformer
Sensors
point cloud data
YOLO
Transformer
multi-source feature fusion
depth completion
title Multi-Task Foreground-Aware Network with Depth Completion for Enhanced RGB-D Fusion Object Detection Based on Transformer
title_full Multi-Task Foreground-Aware Network with Depth Completion for Enhanced RGB-D Fusion Object Detection Based on Transformer
title_fullStr Multi-Task Foreground-Aware Network with Depth Completion for Enhanced RGB-D Fusion Object Detection Based on Transformer
title_full_unstemmed Multi-Task Foreground-Aware Network with Depth Completion for Enhanced RGB-D Fusion Object Detection Based on Transformer
title_short Multi-Task Foreground-Aware Network with Depth Completion for Enhanced RGB-D Fusion Object Detection Based on Transformer
title_sort multi task foreground aware network with depth completion for enhanced rgb d fusion object detection based on transformer
topic point cloud data
YOLO
Transformer
multi-source feature fusion
depth completion
url https://www.mdpi.com/1424-8220/24/7/2374
work_keys_str_mv AT jiashengpan multitaskforegroundawarenetworkwithdepthcompletionforenhancedrgbdfusionobjectdetectionbasedontransformer
AT songyizhong multitaskforegroundawarenetworkwithdepthcompletionforenhancedrgbdfusionobjectdetectionbasedontransformer
AT taoyue multitaskforegroundawarenetworkwithdepthcompletionforenhancedrgbdfusionobjectdetectionbasedontransformer
AT yankunyin multitaskforegroundawarenetworkwithdepthcompletionforenhancedrgbdfusionobjectdetectionbasedontransformer
AT yanhaotang multitaskforegroundawarenetworkwithdepthcompletionforenhancedrgbdfusionobjectdetectionbasedontransformer