Bottom-up top-down cues for weakly-supervised semantic segmentation

We consider the task of learning a classifier for semantic segmentation using weak supervision in the form of image labels specifying objects present in the image. Our method uses deep convolutional neural networks (cnns) and adopts an Expectation-Maximization (EM) based approach. We focus on the fo...

Mô tả đầy đủ

Chi tiết về thư mục
Những tác giả chính:	Hou, Q, Massiceti, D, Dokania, P, Wei, Y, Cheng, M, Torr, P
Định dạng:	Conference item
Được phát hành:	Springer, Cham 2018

_version_	1826268254478270464
author	Hou, Q Massiceti, D Dokania, P Wei, Y Cheng, M Torr, P
author_facet	Hou, Q Massiceti, D Dokania, P Wei, Y Cheng, M Torr, P
author_sort	Hou, Q
collection	OXFORD
description	We consider the task of learning a classifier for semantic segmentation using weak supervision in the form of image labels specifying objects present in the image. Our method uses deep convolutional neural networks (cnns) and adopts an Expectation-Maximization (EM) based approach. We focus on the following three aspects of EM: (i) initialization; (ii) latent posterior estimation (E-step) and (iii) the parameter update (M-step). We show that saliency and attention maps, bottom-up and top-down cues respectively, of images with single objects (simple images) provide highly reliable cues to learn an initialization for the EM. Intuitively, given weak supervisions, we first learn to segment simple images and then move towards the complex ones. Next, for updating the parameters (M step), we propose to minimize the combination of the standard softmax loss and the KL divergence between the latent posterior distribution (obtained using the E-step) and the likelihood given by the cnn. This combination is more robust to wrong predictions made by the E step of the EM algorithm. Extensive experiments and discussions show that our method is very simple and intuitive, and outperforms the state-of-the-art method with a very high margin of 3.7% and 3.9% on the PASCAL VOC12 train and test sets respectively, thus setting new state-of-the-art results.
first_indexed	2024-03-06T21:06:51Z
format	Conference item
id	oxford-uuid:3cc3f562-d6a3-4bf3-9e10-ee8d8d811eff
institution	University of Oxford
last_indexed	2024-03-06T21:06:51Z
publishDate	2018
publisher	Springer, Cham
record_format	dspace
spelling	oxford-uuid:3cc3f562-d6a3-4bf3-9e10-ee8d8d811eff2022-03-26T14:15:33ZBottom-up top-down cues for weakly-supervised semantic segmentationConference itemhttp://purl.org/coar/resource_type/c_5794uuid:3cc3f562-d6a3-4bf3-9e10-ee8d8d811effSymplectic Elements at OxfordSpringer, Cham2018Hou, QMassiceti, DDokania, PWei, YCheng, MTorr, PWe consider the task of learning a classifier for semantic segmentation using weak supervision in the form of image labels specifying objects present in the image. Our method uses deep convolutional neural networks (cnns) and adopts an Expectation-Maximization (EM) based approach. We focus on the following three aspects of EM: (i) initialization; (ii) latent posterior estimation (E-step) and (iii) the parameter update (M-step). We show that saliency and attention maps, bottom-up and top-down cues respectively, of images with single objects (simple images) provide highly reliable cues to learn an initialization for the EM. Intuitively, given weak supervisions, we first learn to segment simple images and then move towards the complex ones. Next, for updating the parameters (M step), we propose to minimize the combination of the standard softmax loss and the KL divergence between the latent posterior distribution (obtained using the E-step) and the likelihood given by the cnn. This combination is more robust to wrong predictions made by the E step of the EM algorithm. Extensive experiments and discussions show that our method is very simple and intuitive, and outperforms the state-of-the-art method with a very high margin of 3.7% and 3.9% on the PASCAL VOC12 train and test sets respectively, thus setting new state-of-the-art results.
spellingShingle	Hou, Q Massiceti, D Dokania, P Wei, Y Cheng, M Torr, P Bottom-up top-down cues for weakly-supervised semantic segmentation
title	Bottom-up top-down cues for weakly-supervised semantic segmentation
title_full	Bottom-up top-down cues for weakly-supervised semantic segmentation
title_fullStr	Bottom-up top-down cues for weakly-supervised semantic segmentation
title_full_unstemmed	Bottom-up top-down cues for weakly-supervised semantic segmentation
title_short	Bottom-up top-down cues for weakly-supervised semantic segmentation
title_sort	bottom up top down cues for weakly supervised semantic segmentation
work_keys_str_mv	AT houq bottomuptopdowncuesforweaklysupervisedsemanticsegmentation AT massicetid bottomuptopdowncuesforweaklysupervisedsemanticsegmentation AT dokaniap bottomuptopdowncuesforweaklysupervisedsemanticsegmentation AT weiy bottomuptopdowncuesforweaklysupervisedsemanticsegmentation AT chengm bottomuptopdowncuesforweaklysupervisedsemanticsegmentation AT torrp bottomuptopdowncuesforweaklysupervisedsemanticsegmentation

Bottom-up top-down cues for weakly-supervised semantic segmentation

Những quyển sách tương tự