Salient object detection for RGBD video via spatial interaction and depth-based boundary refinement

Abstract Recently proposed state-of-the-art saliency detection models rely heavily on labeled datasets and rarely focus on perfect RGBD feature fusion, which lowers their generalization ability. In this paper, we propose a depth-based interaction and refinement network (DIR-Net) to fully leverage th...

Full description

Bibliographic Details
Main Authors: Yujian Zhang, Ziyan Zhang, Ping Zhang, Mengnan Xu
Format: Article
Language:English
Published: Springer 2023-05-01
Series:Complex & Intelligent Systems
Subjects:
Online Access:https://doi.org/10.1007/s40747-023-01072-w
_version_ 1827781011932446720
author Yujian Zhang
Ziyan Zhang
Ping Zhang
Mengnan Xu
author_facet Yujian Zhang
Ziyan Zhang
Ping Zhang
Mengnan Xu
author_sort Yujian Zhang
collection DOAJ
description Abstract Recently proposed state-of-the-art saliency detection models rely heavily on labeled datasets and rarely focus on perfect RGBD feature fusion, which lowers their generalization ability. In this paper, we propose a depth-based interaction and refinement network (DIR-Net) to fully leverage the depth information provided with RGB images to generate and refine the corresponding saliency segmentation maps. In total, three modules are included in our framework. A depth-based refinement module (DRM) and an RGB module work in parallel while coordinating via interactive spatial guidance modules (ISGMs), which utilize spatial and channel attention computed from both depth features and RGB features. In each layer, the features in both modules are refined and guided by the spatial information obtained from the other module through ISGMs. In the RGB module, before sending the depth-guided feature map to the decoder, a convolutional gated recurrent unit (ConvGRU)-based block is introduced to handle temporal information. Thinking about the clear movement information in RGB features, the block also guides temporal information in DRM. By merging the results from both the DRM and RGB modules, a segmentation map with distinct boundaries is generated. Considering the lack of depth images in popular public datasets, we utilize a depth estimation network that incorporates manual postprocessing-based correction to generate depth images on the DAVIS and UVSD datasets. The state-of-the-art performance achieved on both the original and new datasets illustrates the advantage of our RGBD feature fusion strategy, with a real-time speed of 19 fps on a single GPU.
first_indexed 2024-03-11T15:11:52Z
format Article
id doaj.art-8629adc1b9d345debe3995180d1d6733
institution Directory Open Access Journal
issn 2199-4536
2198-6053
language English
last_indexed 2024-03-11T15:11:52Z
publishDate 2023-05-01
publisher Springer
record_format Article
series Complex & Intelligent Systems
spelling doaj.art-8629adc1b9d345debe3995180d1d67332023-10-29T12:41:23ZengSpringerComplex & Intelligent Systems2199-45362198-60532023-05-01966343635810.1007/s40747-023-01072-wSalient object detection for RGBD video via spatial interaction and depth-based boundary refinementYujian Zhang0Ziyan Zhang1Ping Zhang2Mengnan Xu3School of Optoelectronic Science and Engineering, University of Electronic Science and Technology of ChinaSchool of Optoelectronic Science and Engineering, University of Electronic Science and Technology of ChinaSchool of Optoelectronic Science and Engineering, University of Electronic Science and Technology of ChinaSchool of Optoelectronic Science and Engineering, University of Electronic Science and Technology of ChinaAbstract Recently proposed state-of-the-art saliency detection models rely heavily on labeled datasets and rarely focus on perfect RGBD feature fusion, which lowers their generalization ability. In this paper, we propose a depth-based interaction and refinement network (DIR-Net) to fully leverage the depth information provided with RGB images to generate and refine the corresponding saliency segmentation maps. In total, three modules are included in our framework. A depth-based refinement module (DRM) and an RGB module work in parallel while coordinating via interactive spatial guidance modules (ISGMs), which utilize spatial and channel attention computed from both depth features and RGB features. In each layer, the features in both modules are refined and guided by the spatial information obtained from the other module through ISGMs. In the RGB module, before sending the depth-guided feature map to the decoder, a convolutional gated recurrent unit (ConvGRU)-based block is introduced to handle temporal information. Thinking about the clear movement information in RGB features, the block also guides temporal information in DRM. By merging the results from both the DRM and RGB modules, a segmentation map with distinct boundaries is generated. Considering the lack of depth images in popular public datasets, we utilize a depth estimation network that incorporates manual postprocessing-based correction to generate depth images on the DAVIS and UVSD datasets. The state-of-the-art performance achieved on both the original and new datasets illustrates the advantage of our RGBD feature fusion strategy, with a real-time speed of 19 fps on a single GPU.https://doi.org/10.1007/s40747-023-01072-wRGBD videoSaliency detectionBoundary optimizationMultiple modalities
spellingShingle Yujian Zhang
Ziyan Zhang
Ping Zhang
Mengnan Xu
Salient object detection for RGBD video via spatial interaction and depth-based boundary refinement
Complex & Intelligent Systems
RGBD video
Saliency detection
Boundary optimization
Multiple modalities
title Salient object detection for RGBD video via spatial interaction and depth-based boundary refinement
title_full Salient object detection for RGBD video via spatial interaction and depth-based boundary refinement
title_fullStr Salient object detection for RGBD video via spatial interaction and depth-based boundary refinement
title_full_unstemmed Salient object detection for RGBD video via spatial interaction and depth-based boundary refinement
title_short Salient object detection for RGBD video via spatial interaction and depth-based boundary refinement
title_sort salient object detection for rgbd video via spatial interaction and depth based boundary refinement
topic RGBD video
Saliency detection
Boundary optimization
Multiple modalities
url https://doi.org/10.1007/s40747-023-01072-w
work_keys_str_mv AT yujianzhang salientobjectdetectionforrgbdvideoviaspatialinteractionanddepthbasedboundaryrefinement
AT ziyanzhang salientobjectdetectionforrgbdvideoviaspatialinteractionanddepthbasedboundaryrefinement
AT pingzhang salientobjectdetectionforrgbdvideoviaspatialinteractionanddepthbasedboundaryrefinement
AT mengnanxu salientobjectdetectionforrgbdvideoviaspatialinteractionanddepthbasedboundaryrefinement