Quality-Driven Dual-Branch Feature Integration Network for Video Salient Object Detection
Video salient object detection has attracted growing interest in recent years. However, some existing video saliency models often suffer from the inappropriate utilization of spatial and temporal cues and the insufficient aggregation of different level features, leading to remarkable performance deg...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-01-01
|
Series: | Electronics |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-9292/12/3/680 |
_version_ | 1797624749736591360 |
---|---|
author | Xiaofei Zhou Hanxiao Gao Longxuan Yu Defu Yang Jiyong Zhang |
author_facet | Xiaofei Zhou Hanxiao Gao Longxuan Yu Defu Yang Jiyong Zhang |
author_sort | Xiaofei Zhou |
collection | DOAJ |
description | Video salient object detection has attracted growing interest in recent years. However, some existing video saliency models often suffer from the inappropriate utilization of spatial and temporal cues and the insufficient aggregation of different level features, leading to remarkable performance degradation. Therefore, we propose a quality-driven dual-branch feature integration network majoring in the adaptive fusion of multi-modal cues and sufficient aggregation of multi-level spatiotemporal features. Firstly, we employ the quality-driven multi-modal feature fusion (QMFF) module to combine the spatial and temporal features. Particularly, the quality scores estimated from each level’s spatial and temporal cues are not only used to weigh the two modal features but also to adaptively integrate the coarse spatial and temporal saliency predictions into the guidance map, which further enhances the two modal features. Secondly, we deploy the dual-branch-based multi-level feature aggregation (DMFA) module to integrate multi-level spatiotemporal features, where the two branches including the progressive decoder branch and the direct concatenation branch sufficiently explore the cooperation of multi-level spatiotemporal features. In particular, in order to provide an adaptive fusion for the outputs of the two branches, we design the dual-branch fusion (DF) unit, where the channel weight of each output can be learned jointly from the two outputs. The experiments conducted on four video datasets clearly demonstrate the effectiveness and superiority of our model against the state-of-the-art video saliency models. |
first_indexed | 2024-03-11T09:47:29Z |
format | Article |
id | doaj.art-6ab9c233cbb447b18832fe089c48fab9 |
institution | Directory Open Access Journal |
issn | 2079-9292 |
language | English |
last_indexed | 2024-03-11T09:47:29Z |
publishDate | 2023-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Electronics |
spelling | doaj.art-6ab9c233cbb447b18832fe089c48fab92023-11-16T16:29:56ZengMDPI AGElectronics2079-92922023-01-0112368010.3390/electronics12030680Quality-Driven Dual-Branch Feature Integration Network for Video Salient Object DetectionXiaofei Zhou0Hanxiao Gao1Longxuan Yu2Defu Yang3Jiyong Zhang4School of Automation, Hangzhou Dianzi University, Hangzhou 310018, ChinaSchool of Automation, Hangzhou Dianzi University, Hangzhou 310018, ChinaSchool of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, ChinaSchool of Automation, Hangzhou Dianzi University, Hangzhou 310018, ChinaSchool of Automation, Hangzhou Dianzi University, Hangzhou 310018, ChinaVideo salient object detection has attracted growing interest in recent years. However, some existing video saliency models often suffer from the inappropriate utilization of spatial and temporal cues and the insufficient aggregation of different level features, leading to remarkable performance degradation. Therefore, we propose a quality-driven dual-branch feature integration network majoring in the adaptive fusion of multi-modal cues and sufficient aggregation of multi-level spatiotemporal features. Firstly, we employ the quality-driven multi-modal feature fusion (QMFF) module to combine the spatial and temporal features. Particularly, the quality scores estimated from each level’s spatial and temporal cues are not only used to weigh the two modal features but also to adaptively integrate the coarse spatial and temporal saliency predictions into the guidance map, which further enhances the two modal features. Secondly, we deploy the dual-branch-based multi-level feature aggregation (DMFA) module to integrate multi-level spatiotemporal features, where the two branches including the progressive decoder branch and the direct concatenation branch sufficiently explore the cooperation of multi-level spatiotemporal features. In particular, in order to provide an adaptive fusion for the outputs of the two branches, we design the dual-branch fusion (DF) unit, where the channel weight of each output can be learned jointly from the two outputs. The experiments conducted on four video datasets clearly demonstrate the effectiveness and superiority of our model against the state-of-the-art video saliency models.https://www.mdpi.com/2079-9292/12/3/680video salient object detectionquality scorefeature fusiondual-branch |
spellingShingle | Xiaofei Zhou Hanxiao Gao Longxuan Yu Defu Yang Jiyong Zhang Quality-Driven Dual-Branch Feature Integration Network for Video Salient Object Detection Electronics video salient object detection quality score feature fusion dual-branch |
title | Quality-Driven Dual-Branch Feature Integration Network for Video Salient Object Detection |
title_full | Quality-Driven Dual-Branch Feature Integration Network for Video Salient Object Detection |
title_fullStr | Quality-Driven Dual-Branch Feature Integration Network for Video Salient Object Detection |
title_full_unstemmed | Quality-Driven Dual-Branch Feature Integration Network for Video Salient Object Detection |
title_short | Quality-Driven Dual-Branch Feature Integration Network for Video Salient Object Detection |
title_sort | quality driven dual branch feature integration network for video salient object detection |
topic | video salient object detection quality score feature fusion dual-branch |
url | https://www.mdpi.com/2079-9292/12/3/680 |
work_keys_str_mv | AT xiaofeizhou qualitydrivendualbranchfeatureintegrationnetworkforvideosalientobjectdetection AT hanxiaogao qualitydrivendualbranchfeatureintegrationnetworkforvideosalientobjectdetection AT longxuanyu qualitydrivendualbranchfeatureintegrationnetworkforvideosalientobjectdetection AT defuyang qualitydrivendualbranchfeatureintegrationnetworkforvideosalientobjectdetection AT jiyongzhang qualitydrivendualbranchfeatureintegrationnetworkforvideosalientobjectdetection |