Bottom-up top-down cues for weakly-supervised semantic segmentation

We consider the task of learning a classifier for semantic segmentation using weak supervision in the form of image labels specifying objects present in the image. Our method uses deep convolutional neural networks (cnns) and adopts an Expectation-Maximization (EM) based approach. We focus on the fo...

Mô tả đầy đủ

Chi tiết về thư mục
Những tác giả chính: Hou, Q, Massiceti, D, Dokania, P, Wei, Y, Cheng, M, Torr, P
Định dạng: Conference item
Được phát hành: Springer, Cham 2018
_version_ 1826268254478270464
author Hou, Q
Massiceti, D
Dokania, P
Wei, Y
Cheng, M
Torr, P
author_facet Hou, Q
Massiceti, D
Dokania, P
Wei, Y
Cheng, M
Torr, P
author_sort Hou, Q
collection OXFORD
description We consider the task of learning a classifier for semantic segmentation using weak supervision in the form of image labels specifying objects present in the image. Our method uses deep convolutional neural networks (cnns) and adopts an Expectation-Maximization (EM) based approach. We focus on the following three aspects of EM: (i) initialization; (ii) latent posterior estimation (E-step) and (iii) the parameter update (M-step). We show that saliency and attention maps, bottom-up and top-down cues respectively, of images with single objects (simple images) provide highly reliable cues to learn an initialization for the EM. Intuitively, given weak supervisions, we first learn to segment simple images and then move towards the complex ones. Next, for updating the parameters (M step), we propose to minimize the combination of the standard softmax loss and the KL divergence between the latent posterior distribution (obtained using the E-step) and the likelihood given by the cnn. This combination is more robust to wrong predictions made by the E step of the EM algorithm. Extensive experiments and discussions show that our method is very simple and intuitive, and outperforms the state-of-the-art method with a very high margin of 3.7% and 3.9% on the PASCAL VOC12 train and test sets respectively, thus setting new state-of-the-art results.
first_indexed 2024-03-06T21:06:51Z
format Conference item
id oxford-uuid:3cc3f562-d6a3-4bf3-9e10-ee8d8d811eff
institution University of Oxford
last_indexed 2024-03-06T21:06:51Z
publishDate 2018
publisher Springer, Cham
record_format dspace
spelling oxford-uuid:3cc3f562-d6a3-4bf3-9e10-ee8d8d811eff2022-03-26T14:15:33ZBottom-up top-down cues for weakly-supervised semantic segmentationConference itemhttp://purl.org/coar/resource_type/c_5794uuid:3cc3f562-d6a3-4bf3-9e10-ee8d8d811effSymplectic Elements at OxfordSpringer, Cham2018Hou, QMassiceti, DDokania, PWei, YCheng, MTorr, PWe consider the task of learning a classifier for semantic segmentation using weak supervision in the form of image labels specifying objects present in the image. Our method uses deep convolutional neural networks (cnns) and adopts an Expectation-Maximization (EM) based approach. We focus on the following three aspects of EM: (i) initialization; (ii) latent posterior estimation (E-step) and (iii) the parameter update (M-step). We show that saliency and attention maps, bottom-up and top-down cues respectively, of images with single objects (simple images) provide highly reliable cues to learn an initialization for the EM. Intuitively, given weak supervisions, we first learn to segment simple images and then move towards the complex ones. Next, for updating the parameters (M step), we propose to minimize the combination of the standard softmax loss and the KL divergence between the latent posterior distribution (obtained using the E-step) and the likelihood given by the cnn. This combination is more robust to wrong predictions made by the E step of the EM algorithm. Extensive experiments and discussions show that our method is very simple and intuitive, and outperforms the state-of-the-art method with a very high margin of 3.7% and 3.9% on the PASCAL VOC12 train and test sets respectively, thus setting new state-of-the-art results.
spellingShingle Hou, Q
Massiceti, D
Dokania, P
Wei, Y
Cheng, M
Torr, P
Bottom-up top-down cues for weakly-supervised semantic segmentation
title Bottom-up top-down cues for weakly-supervised semantic segmentation
title_full Bottom-up top-down cues for weakly-supervised semantic segmentation
title_fullStr Bottom-up top-down cues for weakly-supervised semantic segmentation
title_full_unstemmed Bottom-up top-down cues for weakly-supervised semantic segmentation
title_short Bottom-up top-down cues for weakly-supervised semantic segmentation
title_sort bottom up top down cues for weakly supervised semantic segmentation
work_keys_str_mv AT houq bottomuptopdowncuesforweaklysupervisedsemanticsegmentation
AT massicetid bottomuptopdowncuesforweaklysupervisedsemanticsegmentation
AT dokaniap bottomuptopdowncuesforweaklysupervisedsemanticsegmentation
AT weiy bottomuptopdowncuesforweaklysupervisedsemanticsegmentation
AT chengm bottomuptopdowncuesforweaklysupervisedsemanticsegmentation
AT torrp bottomuptopdowncuesforweaklysupervisedsemanticsegmentation