Summary: | Semantic segmentation, which aims to acquire pixel-level understanding about images, is among the key components in computer vision. To train a good segmentation model for real-world images, it usually requires a huge amount of time and labor effort to obtain sufficient pixel-level annotations of real-world images beforehand. To get rid of such a nontrivial burden, one can use simulators to automatically generate synthetic images that inherently contain full pixel-level annotations and use them to train a segmentation model for the real-world images. However, training with synthetic images usually cannot lead to good performance due to the domain difference between the synthetic images (i.e., source domain) and the real-world images (i.e., target domain). To deal with this issue, a number of unsupervised domain adaptation (UDA) approaches have been proposed, where no labeled real-world images are available. Different from those methods, in this work, we conduct a pioneer attempt by using easy-to-collect image-level annotations for target images to improve the performance of cross-domain segmentation. Specifically, we leverage those image-level annotations to construct curriculums for the domain adaptation problem. The curriculums describe multi-level properties of the target domain, including label distributions over full images, local regions and single pixels. Since image annotations are 'weak' labels compared to pixel annotations for segmentation, we coin this new problem as weakly-supervised cross-domain segmentation. Comprehensive experiments on the GTA5 -> Cityscapes and SYNTHIA -> Cityscapes settings demonstrate the effectiveness of our method over the existing state-of-the-art baselines.
|