Visual concepts and compositional voting

It is very attractive to formulate vision in terms of pattern theory [26], where patterns are defined hierarchically by compositions of elementary building blocks. But applying pattern theory to real world images is very challenging and is currently less successful than discriminative methods such a...

पूर्ण विवरण

ग्रंथसूची विवरण
मुख्य लेखकों: Wang, Jianyu, Zhang, Zhishuai, Xie, Cihang, Zhou, Yuyin, Premachandran, Vittal, Zhu, Jun, Xie, Lingxi, Yuille, Alan L.
स्वरूप: Technical Report
भाषा:en_US
प्रकाशित: Center for Brains, Minds and Machines (CBMM) 2018
ऑनलाइन पहुंच:http://hdl.handle.net/1721.1/115182
_version_ 1826209327974711296
author Wang, Jianyu
Zhang, Zhishuai
Xie, Cihang
Zhou, Yuyin
Premachandran, Vittal
Zhu, Jun
Xie, Lingxi
Yuille, Alan L.
author_facet Wang, Jianyu
Zhang, Zhishuai
Xie, Cihang
Zhou, Yuyin
Premachandran, Vittal
Zhu, Jun
Xie, Lingxi
Yuille, Alan L.
author_sort Wang, Jianyu
collection MIT
description It is very attractive to formulate vision in terms of pattern theory [26], where patterns are defined hierarchically by compositions of elementary building blocks. But applying pattern theory to real world images is very challenging and is currently less successful than discriminative methods such as deep networks. Deep networks, however, are black-boxes which are hard to interpret and, as we will show, can easily be fooled by adding occluding objects. It is natural to wonder whether by better under- standing deep networks we can extract building blocks which can be used to develop pattern theoretic models. This motivates us to study the internal feature vectors of a deep network using images of vehicles from the PASCAL3D+ dataset with the scale of objects fixed. We use clustering algorithms, such as K-means, to study the population activity of the features and extract a set of visual concepts which we show are visually tight and correspond to semantic parts of the vehicles. To analyze this in more detail, we annotate these vehicles by their semantic parts to create a new dataset which we call VehicleSemanticParts, and evaluate visual concepts as unsupervised semantic part detectors. Our results show that visual concepts perform fairly well but are outperformed by supervised discriminative methods such as Support Vector Machines. We next give a more detailed analysis of visual concepts and how they relate to semantic parts. Following this analysis, we use the visual concepts as building blocks for a simple pattern theoretical model, which we call compositional voting. In this model several visual concepts combine to detect semantic parts. We show that this approach is significantly better than discriminative methods like Support Vector machines and deep networks trained specifically for semantic part detection. Finally, we return to studying occlusion by creating an annotated dataset with occlusion, called Vehicle Occlusion, and show that compositional voting outperforms even deep networks when the amount of occlusion becomes large.
first_indexed 2024-09-23T14:20:47Z
format Technical Report
id mit-1721.1/115182
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T14:20:47Z
publishDate 2018
publisher Center for Brains, Minds and Machines (CBMM)
record_format dspace
spelling mit-1721.1/1151822019-04-12T22:34:00Z Visual concepts and compositional voting Wang, Jianyu Zhang, Zhishuai Xie, Cihang Zhou, Yuyin Premachandran, Vittal Zhu, Jun Xie, Lingxi Yuille, Alan L. It is very attractive to formulate vision in terms of pattern theory [26], where patterns are defined hierarchically by compositions of elementary building blocks. But applying pattern theory to real world images is very challenging and is currently less successful than discriminative methods such as deep networks. Deep networks, however, are black-boxes which are hard to interpret and, as we will show, can easily be fooled by adding occluding objects. It is natural to wonder whether by better under- standing deep networks we can extract building blocks which can be used to develop pattern theoretic models. This motivates us to study the internal feature vectors of a deep network using images of vehicles from the PASCAL3D+ dataset with the scale of objects fixed. We use clustering algorithms, such as K-means, to study the population activity of the features and extract a set of visual concepts which we show are visually tight and correspond to semantic parts of the vehicles. To analyze this in more detail, we annotate these vehicles by their semantic parts to create a new dataset which we call VehicleSemanticParts, and evaluate visual concepts as unsupervised semantic part detectors. Our results show that visual concepts perform fairly well but are outperformed by supervised discriminative methods such as Support Vector Machines. We next give a more detailed analysis of visual concepts and how they relate to semantic parts. Following this analysis, we use the visual concepts as building blocks for a simple pattern theoretical model, which we call compositional voting. In this model several visual concepts combine to detect semantic parts. We show that this approach is significantly better than discriminative methods like Support Vector machines and deep networks trained specifically for semantic part detection. Finally, we return to studying occlusion by creating an annotated dataset with occlusion, called Vehicle Occlusion, and show that compositional voting outperforms even deep networks when the amount of occlusion becomes large. This material is based upon work supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216. 2018-05-02T18:10:39Z 2018-05-02T18:10:39Z 2018-03-27 Technical Report Working Paper Other http://hdl.handle.net/1721.1/115182 en_US CBMM Memo Series;087 application/pdf Center for Brains, Minds and Machines (CBMM)
spellingShingle Wang, Jianyu
Zhang, Zhishuai
Xie, Cihang
Zhou, Yuyin
Premachandran, Vittal
Zhu, Jun
Xie, Lingxi
Yuille, Alan L.
Visual concepts and compositional voting
title Visual concepts and compositional voting
title_full Visual concepts and compositional voting
title_fullStr Visual concepts and compositional voting
title_full_unstemmed Visual concepts and compositional voting
title_short Visual concepts and compositional voting
title_sort visual concepts and compositional voting
url http://hdl.handle.net/1721.1/115182
work_keys_str_mv AT wangjianyu visualconceptsandcompositionalvoting
AT zhangzhishuai visualconceptsandcompositionalvoting
AT xiecihang visualconceptsandcompositionalvoting
AT zhouyuyin visualconceptsandcompositionalvoting
AT premachandranvittal visualconceptsandcompositionalvoting
AT zhujun visualconceptsandcompositionalvoting
AT xielingxi visualconceptsandcompositionalvoting
AT yuillealanl visualconceptsandcompositionalvoting