DeepVoting: A Robust and Explainable Deep Network for Semantic Part Detection under Partial Occlusion

In this paper, we study the task of detecting semantic parts of an object, e.g., a wheel of a car, under partial occlusion. We propose that all models should be trained without seeing occlusions while being able to transfer the learned knowledge to deal with occlusions. This setting alleviates the d...

Full description

Bibliographic Details
Main Authors:	Zhang, Zhishuai, Xie, Cihang, Wang, Jianyu, Xie, Lingxi, Yuille, Alan L.
Format:	Technical Report
Language:	en_US
Published:	Center for Brains, Minds and Machines (CBMM) 2018
Online Access:	http://hdl.handle.net/1721.1/115181

_version_	1826195038131978240
author	Zhang, Zhishuai Xie, Cihang Wang, Jianyu Xie, Lingxi Yuille, Alan L.
author_facet	Zhang, Zhishuai Xie, Cihang Wang, Jianyu Xie, Lingxi Yuille, Alan L.
author_sort	Zhang, Zhishuai
collection	MIT
description	In this paper, we study the task of detecting semantic parts of an object, e.g., a wheel of a car, under partial occlusion. We propose that all models should be trained without seeing occlusions while being able to transfer the learned knowledge to deal with occlusions. This setting alleviates the diffi- culty in collecting an exponentially large dataset to cover occlusion patterns and is more essential. In this scenario, the proposal-based deep networks, like RCNN-series, often produce unsatisfactory re- sults, because both the proposal extraction and classification stages may be confused by the irrelevant occluders. To address this, [25] proposed a voting mechanism that combines multiple local visual cues to detect semantic parts. The semantic parts can still be detected even though some visual cues are missing due to occlusions. However, this method is manually-designed, thus is hard to be optimized in an end-to-end manner. In this paper, we present DeepVoting, which incorporates the robustness shown by [25] into a deep network, so that the whole pipeline can be jointly optimized. Specifically, it adds two layers after the intermediate features of a deep network, e.g., the pool-4 layer of VGGNet. The first layer extracts the evidence of local visual cues, and the second layer performs a voting mechanism by utilizing the spatial relationship between visual cues and semantic parts. We also propose an improved version DeepVoting+ by learning visual cues from context outside objects. In experiments, DeepVoting achieves significantly better performance than several baseline methods, including Faster-RCNN, for semantic part detection under occlusion. In addition, DeepVoting enjoys explainability as the detection results can be diagnosed via looking up the voting cues.
first_indexed	2024-09-23T10:05:46Z
format	Technical Report
id	mit-1721.1/115181
institution	Massachusetts Institute of Technology
language	en_US
last_indexed	2024-09-23T10:05:46Z
publishDate	2018
publisher	Center for Brains, Minds and Machines (CBMM)
record_format	dspace
spelling	mit-1721.1/1151812019-04-11T01:09:46Z DeepVoting: A Robust and Explainable Deep Network for Semantic Part Detection under Partial Occlusion Zhang, Zhishuai Xie, Cihang Wang, Jianyu Xie, Lingxi Yuille, Alan L. In this paper, we study the task of detecting semantic parts of an object, e.g., a wheel of a car, under partial occlusion. We propose that all models should be trained without seeing occlusions while being able to transfer the learned knowledge to deal with occlusions. This setting alleviates the diffi- culty in collecting an exponentially large dataset to cover occlusion patterns and is more essential. In this scenario, the proposal-based deep networks, like RCNN-series, often produce unsatisfactory re- sults, because both the proposal extraction and classification stages may be confused by the irrelevant occluders. To address this, [25] proposed a voting mechanism that combines multiple local visual cues to detect semantic parts. The semantic parts can still be detected even though some visual cues are missing due to occlusions. However, this method is manually-designed, thus is hard to be optimized in an end-to-end manner. In this paper, we present DeepVoting, which incorporates the robustness shown by [25] into a deep network, so that the whole pipeline can be jointly optimized. Specifically, it adds two layers after the intermediate features of a deep network, e.g., the pool-4 layer of VGGNet. The first layer extracts the evidence of local visual cues, and the second layer performs a voting mechanism by utilizing the spatial relationship between visual cues and semantic parts. We also propose an improved version DeepVoting+ by learning visual cues from context outside objects. In experiments, DeepVoting achieves significantly better performance than several baseline methods, including Faster-RCNN, for semantic part detection under occlusion. In addition, DeepVoting enjoys explainability as the detection results can be diagnosed via looking up the voting cues. This material is based upon work supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216. 2018-05-02T18:03:01Z 2018-05-02T18:03:01Z 2018-06-19 Technical Report Working Paper Other http://hdl.handle.net/1721.1/115181 en_US CBMM Memo Series;083 application/pdf Center for Brains, Minds and Machines (CBMM)
spellingShingle	Zhang, Zhishuai Xie, Cihang Wang, Jianyu Xie, Lingxi Yuille, Alan L. DeepVoting: A Robust and Explainable Deep Network for Semantic Part Detection under Partial Occlusion
title	DeepVoting: A Robust and Explainable Deep Network for Semantic Part Detection under Partial Occlusion
title_full	DeepVoting: A Robust and Explainable Deep Network for Semantic Part Detection under Partial Occlusion
title_fullStr	DeepVoting: A Robust and Explainable Deep Network for Semantic Part Detection under Partial Occlusion
title_full_unstemmed	DeepVoting: A Robust and Explainable Deep Network for Semantic Part Detection under Partial Occlusion
title_short	DeepVoting: A Robust and Explainable Deep Network for Semantic Part Detection under Partial Occlusion
title_sort	deepvoting a robust and explainable deep network for semantic part detection under partial occlusion
url	http://hdl.handle.net/1721.1/115181
work_keys_str_mv	AT zhangzhishuai deepvotingarobustandexplainabledeepnetworkforsemanticpartdetectionunderpartialocclusion AT xiecihang deepvotingarobustandexplainabledeepnetworkforsemanticpartdetectionunderpartialocclusion AT wangjianyu deepvotingarobustandexplainabledeepnetworkforsemanticpartdetectionunderpartialocclusion AT xielingxi deepvotingarobustandexplainabledeepnetworkforsemanticpartdetectionunderpartialocclusion AT yuillealanl deepvotingarobustandexplainabledeepnetworkforsemanticpartdetectionunderpartialocclusion

DeepVoting: A Robust and Explainable Deep Network for Semantic Part Detection under Partial Occlusion

Similar Items