Object detectors emerge in Deep Scene CNNs

With the success of new computational architectures for visual processing, such as convolutional neural networks (CNN) and access to image databases with millions of labeled examples (e.g., ImageNet, Places), the state of the art in computer vision is advancing rapidly. One important factor for cont...

ver descrição completa

Detalhes bibliográficos
Principais autores: Zhou, Bolei, Khosla, Aditya, Lapedriza Garcia, Agata, Oliva, Aude, Torralba, Antonio
Outros Autores: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Formato: Artigo
Idioma:en_US
Publicado em: 2015
Acesso em linha:http://hdl.handle.net/1721.1/96942
https://orcid.org/0000-0002-0007-3352
https://orcid.org/0000-0002-3570-4396
https://orcid.org/0000-0003-4915-0256
Descrição
Resumo:With the success of new computational architectures for visual processing, such as convolutional neural networks (CNN) and access to image databases with millions of labeled examples (e.g., ImageNet, Places), the state of the art in computer vision is advancing rapidly. One important factor for continued progress is to understand the representations that are learned by the inner layers of these deep architectures. Here we show that object detectors emerge from training CNNs to perform scene classification. As scenes are composed of objects, the CNN for scene classification automatically discovers meaningful objects detectors, representative of the learned scene categories. With object detectors emerging as a result of learning to recognize scenes, our work demonstrates that the same network can perform both scene recognition and object localization in a single forward-pass, without ever having been explicitly taught the notion of objects.