Visual concepts and compositional voting

It is very attractive to formulate vision in terms of pattern theory [26], where patterns are defined hierarchically by compositions of elementary building blocks. But applying pattern theory to real world images is very challenging and is currently less successful than discriminative methods such a...

पूर्ण विवरण

ग्रंथसूची विवरण
मुख्य लेखकों:	Wang, Jianyu, Zhang, Zhishuai, Xie, Cihang, Zhou, Yuyin, Premachandran, Vittal, Zhu, Jun, Xie, Lingxi, Yuille, Alan L.
स्वरूप:	Technical Report
भाषा:	en_US
प्रकाशित:	Center for Brains, Minds and Machines (CBMM) 2018
ऑनलाइन पहुंच:	http://hdl.handle.net/1721.1/115182

_version_	1826209327974711296
author	Wang, Jianyu Zhang, Zhishuai Xie, Cihang Zhou, Yuyin Premachandran, Vittal Zhu, Jun Xie, Lingxi Yuille, Alan L.
author_facet	Wang, Jianyu Zhang, Zhishuai Xie, Cihang Zhou, Yuyin Premachandran, Vittal Zhu, Jun Xie, Lingxi Yuille, Alan L.
author_sort	Wang, Jianyu
collection	MIT
description	It is very attractive to formulate vision in terms of pattern theory [26], where patterns are defined hierarchically by compositions of elementary building blocks. But applying pattern theory to real world images is very challenging and is currently less successful than discriminative methods such as deep networks. Deep networks, however, are black-boxes which are hard to interpret and, as we will show, can easily be fooled by adding occluding objects. It is natural to wonder whether by better under- standing deep networks we can extract building blocks which can be used to develop pattern theoretic models. This motivates us to study the internal feature vectors of a deep network using images of vehicles from the PASCAL3D+ dataset with the scale of objects fixed. We use clustering algorithms, such as K-means, to study the population activity of the features and extract a set of visual concepts which we show are visually tight and correspond to semantic parts of the vehicles. To analyze this in more detail, we annotate these vehicles by their semantic parts to create a new dataset which we call VehicleSemanticParts, and evaluate visual concepts as unsupervised semantic part detectors. Our results show that visual concepts perform fairly well but are outperformed by supervised discriminative methods such as Support Vector Machines. We next give a more detailed analysis of visual concepts and how they relate to semantic parts. Following this analysis, we use the visual concepts as building blocks for a simple pattern theoretical model, which we call compositional voting. In this model several visual concepts combine to detect semantic parts. We show that this approach is significantly better than discriminative methods like Support Vector machines and deep networks trained specifically for semantic part detection. Finally, we return to studying occlusion by creating an annotated dataset with occlusion, called Vehicle Occlusion, and show that compositional voting outperforms even deep networks when the amount of occlusion becomes large.
first_indexed	2024-09-23T14:20:47Z
format	Technical Report
id	mit-1721.1/115182
institution	Massachusetts Institute of Technology
language	en_US
last_indexed	2024-09-23T14:20:47Z
publishDate	2018
publisher	Center for Brains, Minds and Machines (CBMM)
record_format	dspace
spelling	mit-1721.1/1151822019-04-12T22:34:00Z Visual concepts and compositional voting Wang, Jianyu Zhang, Zhishuai Xie, Cihang Zhou, Yuyin Premachandran, Vittal Zhu, Jun Xie, Lingxi Yuille, Alan L. It is very attractive to formulate vision in terms of pattern theory [26], where patterns are defined hierarchically by compositions of elementary building blocks. But applying pattern theory to real world images is very challenging and is currently less successful than discriminative methods such as deep networks. Deep networks, however, are black-boxes which are hard to interpret and, as we will show, can easily be fooled by adding occluding objects. It is natural to wonder whether by better under- standing deep networks we can extract building blocks which can be used to develop pattern theoretic models. This motivates us to study the internal feature vectors of a deep network using images of vehicles from the PASCAL3D+ dataset with the scale of objects fixed. We use clustering algorithms, such as K-means, to study the population activity of the features and extract a set of visual concepts which we show are visually tight and correspond to semantic parts of the vehicles. To analyze this in more detail, we annotate these vehicles by their semantic parts to create a new dataset which we call VehicleSemanticParts, and evaluate visual concepts as unsupervised semantic part detectors. Our results show that visual concepts perform fairly well but are outperformed by supervised discriminative methods such as Support Vector Machines. We next give a more detailed analysis of visual concepts and how they relate to semantic parts. Following this analysis, we use the visual concepts as building blocks for a simple pattern theoretical model, which we call compositional voting. In this model several visual concepts combine to detect semantic parts. We show that this approach is significantly better than discriminative methods like Support Vector machines and deep networks trained specifically for semantic part detection. Finally, we return to studying occlusion by creating an annotated dataset with occlusion, called Vehicle Occlusion, and show that compositional voting outperforms even deep networks when the amount of occlusion becomes large. This material is based upon work supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216. 2018-05-02T18:10:39Z 2018-05-02T18:10:39Z 2018-03-27 Technical Report Working Paper Other http://hdl.handle.net/1721.1/115182 en_US CBMM Memo Series;087 application/pdf Center for Brains, Minds and Machines (CBMM)
spellingShingle	Wang, Jianyu Zhang, Zhishuai Xie, Cihang Zhou, Yuyin Premachandran, Vittal Zhu, Jun Xie, Lingxi Yuille, Alan L. Visual concepts and compositional voting
title	Visual concepts and compositional voting
title_full	Visual concepts and compositional voting
title_fullStr	Visual concepts and compositional voting
title_full_unstemmed	Visual concepts and compositional voting
title_short	Visual concepts and compositional voting
title_sort	visual concepts and compositional voting
url	http://hdl.handle.net/1721.1/115182
work_keys_str_mv	AT wangjianyu visualconceptsandcompositionalvoting AT zhangzhishuai visualconceptsandcompositionalvoting AT xiecihang visualconceptsandcompositionalvoting AT zhouyuyin visualconceptsandcompositionalvoting AT premachandranvittal visualconceptsandcompositionalvoting AT zhujun visualconceptsandcompositionalvoting AT xielingxi visualconceptsandcompositionalvoting AT yuillealanl visualconceptsandcompositionalvoting

Visual concepts and compositional voting

समान संसाधन