Object detectors emerge in Deep Scene CNNs

With the success of new computational architectures for visual processing, such as convolutional neural networks (CNN) and access to image databases with millions of labeled examples (e.g., ImageNet, Places), the state of the art in computer vision is advancing rapidly. One important factor for cont...

पूर्ण विवरण

ग्रंथसूची विवरण
मुख्य लेखकों: Zhou, Bolei, Khosla, Aditya, Lapedriza Garcia, Agata, Oliva, Aude, Torralba, Antonio
अन्य लेखक: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
स्वरूप: लेख
भाषा:en_US
प्रकाशित: 2015
ऑनलाइन पहुंच:http://hdl.handle.net/1721.1/96942
https://orcid.org/0000-0002-0007-3352
https://orcid.org/0000-0002-3570-4396
https://orcid.org/0000-0003-4915-0256
_version_ 1826216890785071104
author Zhou, Bolei
Khosla, Aditya
Lapedriza Garcia, Agata
Oliva, Aude
Torralba, Antonio
author2 Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Zhou, Bolei
Khosla, Aditya
Lapedriza Garcia, Agata
Oliva, Aude
Torralba, Antonio
author_sort Zhou, Bolei
collection MIT
description With the success of new computational architectures for visual processing, such as convolutional neural networks (CNN) and access to image databases with millions of labeled examples (e.g., ImageNet, Places), the state of the art in computer vision is advancing rapidly. One important factor for continued progress is to understand the representations that are learned by the inner layers of these deep architectures. Here we show that object detectors emerge from training CNNs to perform scene classification. As scenes are composed of objects, the CNN for scene classification automatically discovers meaningful objects detectors, representative of the learned scene categories. With object detectors emerging as a result of learning to recognize scenes, our work demonstrates that the same network can perform both scene recognition and object localization in a single forward-pass, without ever having been explicitly taught the notion of objects.
first_indexed 2024-09-23T16:54:45Z
format Article
id mit-1721.1/96942
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T16:54:45Z
publishDate 2015
record_format dspace
spelling mit-1721.1/969422022-10-03T09:05:06Z Object detectors emerge in Deep Scene CNNs Zhou, Bolei Khosla, Aditya Lapedriza Garcia, Agata Oliva, Aude Torralba, Antonio Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Zhou, Bolei Khosla, Aditya Lapedriza Garcia, Agata Oliva, Aude Torralba, Antonio With the success of new computational architectures for visual processing, such as convolutional neural networks (CNN) and access to image databases with millions of labeled examples (e.g., ImageNet, Places), the state of the art in computer vision is advancing rapidly. One important factor for continued progress is to understand the representations that are learned by the inner layers of these deep architectures. Here we show that object detectors emerge from training CNNs to perform scene classification. As scenes are composed of objects, the CNN for scene classification automatically discovers meaningful objects detectors, representative of the learned scene categories. With object detectors emerging as a result of learning to recognize scenes, our work demonstrates that the same network can perform both scene recognition and object localization in a single forward-pass, without ever having been explicitly taught the notion of objects. National Science Foundation (U.S.) (Grant 1016862) United States. Office of Naval Research. Multidisciplinary University Research Initiative (N000141010933) Google (Firm) Xerox Corporation 2015-05-08T16:56:01Z 2015-05-08T16:56:01Z 2015-05 Article http://purl.org/eprint/type/ConferencePaper http://hdl.handle.net/1721.1/96942 Bolei, Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. "Object detectors emerge in Deep Scene CNNs." 2015 International Conference on Learning Representations, May 7-9, 2015. https://orcid.org/0000-0002-0007-3352 https://orcid.org/0000-0002-3570-4396 https://orcid.org/0000-0003-4915-0256 en_US http://www.iclr.cc/doku.php?id=iclr2015:main#conference_schedule Proceedings of the 2015 International Conference on Learning Representations Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf arXiv
spellingShingle Zhou, Bolei
Khosla, Aditya
Lapedriza Garcia, Agata
Oliva, Aude
Torralba, Antonio
Object detectors emerge in Deep Scene CNNs
title Object detectors emerge in Deep Scene CNNs
title_full Object detectors emerge in Deep Scene CNNs
title_fullStr Object detectors emerge in Deep Scene CNNs
title_full_unstemmed Object detectors emerge in Deep Scene CNNs
title_short Object detectors emerge in Deep Scene CNNs
title_sort object detectors emerge in deep scene cnns
url http://hdl.handle.net/1721.1/96942
https://orcid.org/0000-0002-0007-3352
https://orcid.org/0000-0002-3570-4396
https://orcid.org/0000-0003-4915-0256
work_keys_str_mv AT zhoubolei objectdetectorsemergeindeepscenecnns
AT khoslaaditya objectdetectorsemergeindeepscenecnns
AT lapedrizagarciaagata objectdetectorsemergeindeepscenecnns
AT olivaaude objectdetectorsemergeindeepscenecnns
AT torralbaantonio objectdetectorsemergeindeepscenecnns