Summary: | Fully convolutional networks (FCNs) play an significant role in salient object detection tasks, due to the capability of extracting abundant multi-level and multi-scale features. However, most of FCN-based models utilize multi-level features in a single indiscriminative manner, which is difficult to accurately predict saliency maps. To address this problem, in this article, we propose a recurrent network which uses hierarchical attention features as a guidance for salient object detection. First of all, we divide multi-level features into low-level features and high-level features. Multi-scale features are extracted from high-level features using atrous convolutions with different receptive fields to obtain contextual information. Meanwhile, low-level features are refined as supplement to add detailed information in convolutional features. It is observed that the attention focus of hierarchical features is considerably different because of their distinct information representations. For this reason, a two-stage attention module is introduced for hierarchical features to guide the generation of saliency maps. Effective hierarchial attention features are obtained by aggregating the low-level and high-level features, but the attention of integrated features may be biased, leading to deviations in the detected salient regions. Therefore, we design a recurrent guidance network to correct the biased salient regions, which can effectively suppress the distractions in background and progressively refine salient objects boundaries. Experimental results show that our method exhibits superior performance in both quantitative and qualitative assessments on several widely used benchmark datasets.
|