Coordinate Attention Filtering Depth-Feature Guide Cross-Modal Fusion RGB-Depth Salient Object Detection

Existing RGB + depth (RGB-D) salient object detection methods mainly focus on better integrating the cross-modal features of RGB images and depth maps. Many methods use the same feature interaction module to fuse RGB and depth maps, which ignores the inherent properties of different modalities. In c...

Full description

Bibliographic Details
Main Authors: Lingbing Meng, Mengya Yuan, Xuehan Shi, Qingqing Liu, Le Zhange, Jinhua Wu, Ping Dai, Fei Cheng
Format: Article
Language:English
Published: Hindawi Limited 2023-01-01
Series:Advances in Multimedia
Online Access:http://dx.doi.org/10.1155/2023/9921988
_version_ 1827000308319911936
author Lingbing Meng
Mengya Yuan
Xuehan Shi
Qingqing Liu
Le Zhange
Jinhua Wu
Ping Dai
Fei Cheng
author_facet Lingbing Meng
Mengya Yuan
Xuehan Shi
Qingqing Liu
Le Zhange
Jinhua Wu
Ping Dai
Fei Cheng
author_sort Lingbing Meng
collection DOAJ
description Existing RGB + depth (RGB-D) salient object detection methods mainly focus on better integrating the cross-modal features of RGB images and depth maps. Many methods use the same feature interaction module to fuse RGB and depth maps, which ignores the inherent properties of different modalities. In contrast to previous methods, this paper proposes a novel RGB-D salient object detection method that uses a depth-feature guide cross-modal fusion module based on the properties of RGB and depth maps. First, a depth-feature guide cross-modal fusion module is designed using coordinate attention to utilize the simple data representation capability of depth maps effectively. Second, a dense decoder guidance module is proposed to recover the spatial details of salient objects. Furthermore, a context-aware content module is proposed to extract rich context information, which can predict multiple objects more completely. Experimental results on six benchmark public datasets demonstrate that, compared with 15 mainstream convolutional neural network detection methods, the saliency map edge contours detected by the proposed model have better continuity and the spatial structure details are clearer. Perfect results are achieved on four quantitative evaluation metrics. Furthermore, the effectiveness of the three proposed modules is verified through ablation experiments.
first_indexed 2024-04-09T12:34:45Z
format Article
id doaj.art-d9b0eaa7ab3148c3bf6bf559d93e1b16
institution Directory Open Access Journal
issn 1687-5699
language English
last_indexed 2025-02-18T10:43:14Z
publishDate 2023-01-01
publisher Hindawi Limited
record_format Article
series Advances in Multimedia
spelling doaj.art-d9b0eaa7ab3148c3bf6bf559d93e1b162024-11-02T05:28:00ZengHindawi LimitedAdvances in Multimedia1687-56992023-01-01202310.1155/2023/9921988Coordinate Attention Filtering Depth-Feature Guide Cross-Modal Fusion RGB-Depth Salient Object DetectionLingbing Meng0Mengya Yuan1Xuehan Shi2Qingqing Liu3Le Zhange4Jinhua Wu5Ping Dai6Fei Cheng7School of Anhui Institute of Information TechnologySchool of Anhui Institute of Information TechnologySchool of Anhui Institute of Information TechnologySchool of Anhui Institute of Information TechnologySchool of Anhui Institute of Information TechnologySchool of Anhui Institute of Information TechnologySchool of Anhui Institute of Information TechnologySchool of Anhui Institute of Information TechnologyExisting RGB + depth (RGB-D) salient object detection methods mainly focus on better integrating the cross-modal features of RGB images and depth maps. Many methods use the same feature interaction module to fuse RGB and depth maps, which ignores the inherent properties of different modalities. In contrast to previous methods, this paper proposes a novel RGB-D salient object detection method that uses a depth-feature guide cross-modal fusion module based on the properties of RGB and depth maps. First, a depth-feature guide cross-modal fusion module is designed using coordinate attention to utilize the simple data representation capability of depth maps effectively. Second, a dense decoder guidance module is proposed to recover the spatial details of salient objects. Furthermore, a context-aware content module is proposed to extract rich context information, which can predict multiple objects more completely. Experimental results on six benchmark public datasets demonstrate that, compared with 15 mainstream convolutional neural network detection methods, the saliency map edge contours detected by the proposed model have better continuity and the spatial structure details are clearer. Perfect results are achieved on four quantitative evaluation metrics. Furthermore, the effectiveness of the three proposed modules is verified through ablation experiments.http://dx.doi.org/10.1155/2023/9921988
spellingShingle Lingbing Meng
Mengya Yuan
Xuehan Shi
Qingqing Liu
Le Zhange
Jinhua Wu
Ping Dai
Fei Cheng
Coordinate Attention Filtering Depth-Feature Guide Cross-Modal Fusion RGB-Depth Salient Object Detection
Advances in Multimedia
title Coordinate Attention Filtering Depth-Feature Guide Cross-Modal Fusion RGB-Depth Salient Object Detection
title_full Coordinate Attention Filtering Depth-Feature Guide Cross-Modal Fusion RGB-Depth Salient Object Detection
title_fullStr Coordinate Attention Filtering Depth-Feature Guide Cross-Modal Fusion RGB-Depth Salient Object Detection
title_full_unstemmed Coordinate Attention Filtering Depth-Feature Guide Cross-Modal Fusion RGB-Depth Salient Object Detection
title_short Coordinate Attention Filtering Depth-Feature Guide Cross-Modal Fusion RGB-Depth Salient Object Detection
title_sort coordinate attention filtering depth feature guide cross modal fusion rgb depth salient object detection
url http://dx.doi.org/10.1155/2023/9921988
work_keys_str_mv AT lingbingmeng coordinateattentionfilteringdepthfeatureguidecrossmodalfusionrgbdepthsalientobjectdetection
AT mengyayuan coordinateattentionfilteringdepthfeatureguidecrossmodalfusionrgbdepthsalientobjectdetection
AT xuehanshi coordinateattentionfilteringdepthfeatureguidecrossmodalfusionrgbdepthsalientobjectdetection
AT qingqingliu coordinateattentionfilteringdepthfeatureguidecrossmodalfusionrgbdepthsalientobjectdetection
AT lezhange coordinateattentionfilteringdepthfeatureguidecrossmodalfusionrgbdepthsalientobjectdetection
AT jinhuawu coordinateattentionfilteringdepthfeatureguidecrossmodalfusionrgbdepthsalientobjectdetection
AT pingdai coordinateattentionfilteringdepthfeatureguidecrossmodalfusionrgbdepthsalientobjectdetection
AT feicheng coordinateattentionfilteringdepthfeatureguidecrossmodalfusionrgbdepthsalientobjectdetection