Summary: | Weakly supervised instance segmentation (WSIS) provides a promising way to address instance segmentation in the absence of sufficient labeled data for training. Previous attempts on WSIS usually follow a proposal-based paradigm, critical to which is the proposal scoring strategy. These works mostly rely on certain heuristic strategies for proposal scoring, which largely hampers the sustainable advances concerning WSIS. Towards this end, this paper introduces a novel framework for weakly supervised instance segmentation, called Weakly Supervised R-CNN (WS-RCNN). The basic idea is to deploy a deep network to learn to score proposals, under the special setting of weak supervision. To tackle the key issue of acquiring proposal-level pseudo labels for model training, we propose a so-called Attention-Guided Pseudo Labeling (AGPL) strategy, which leverages the local maximal (peaks) in image-level attention maps and the spatial relationship among peaks and proposals to infer pseudo labels. We also suggest a novel training loss, called Entropic OpenSet Loss, to handle background proposals more effectively so as to further improve the robustness. Comprehensive experiments on two standard benchmarking datasets demonstrate that the proposed WS-RCNN can outperform the state-of-the-art by a large margin, with an improvement of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>11.6</mn><mo>%</mo></mrow></semantics></math></inline-formula> on PASCAL VOC 2012 and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>10.7</mn><mo>%</mo></mrow></semantics></math></inline-formula> on MS COCO 2014 in terms of mAP<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mn>50</mn></msub></semantics></math></inline-formula>, which indicates that learning-based proposal scoring and the proposed WS-RCNN framework might be a promising way towards WSIS.
|