Summary: | Multi-modal feature fusion and effectively exploiting high-level semantic information are critical in salient object detection (SOD). However, the depth maps complementing RGB image fusion strategies cannot supply effective semantic information when the object is not salient in the depth maps. Furthermore, most existing (UNet-based) methods cannot fully exploit high-level abstract features to guide low-level features in a coarse-to-fine fashion. In this paper, we propose a compensated attention feature fusion and hierarchical multiplication decoder network (CAF-HMNet) for RGB-D SOD. Specifically, we first propose a compensated attention feature fusion module to fuse multi-modal features based on the complementarity between depth and RGB features. Then, we propose a hierarchical multiplication decoder to refine the multi-level features from top down. Additionally, a contour-aware module is applied to enhance object contour. Experimental results show that our model achieves satisfactory performance on five challenging SOD datasets, including NJU2K, NLPR, STERE, DES, and SIP, which verifies the effectiveness of the proposed CAF-HMNet.
|