Learning Deep Features for Discriminative Localization

In this work, we revisit the global average pooling layer proposed in, and shed light on how it explicitly enables the convolutional neural network (CNN) to have remarkable localization ability despite being trained on imagelevel labels. While this technique was previously proposed as a means for re...

Full description

Bibliographic Details
Main Authors: Zhou, Bolei, Khosla, Aditya, Lapedriza Garcia, Agata, Oliva, Aude, Torralba, Antonio
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format: Article
Language:en_US
Published: Institute of Electrical and Electronics Engineers (IEEE) 2017
Online Access:http://hdl.handle.net/1721.1/112986
https://orcid.org/0000-0002-3570-4396
https://orcid.org/0000-0002-0007-3352
https://orcid.org/0000-0003-4915-0256
_version_ 1826217421463093248
author Zhou, Bolei
Khosla, Aditya
Lapedriza Garcia, Agata
Oliva, Aude
Torralba, Antonio
author2 Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Zhou, Bolei
Khosla, Aditya
Lapedriza Garcia, Agata
Oliva, Aude
Torralba, Antonio
author_sort Zhou, Bolei
collection MIT
description In this work, we revisit the global average pooling layer proposed in, and shed light on how it explicitly enables the convolutional neural network (CNN) to have remarkable localization ability despite being trained on imagelevel labels. While this technique was previously proposed as a means for regularizing training, we find that it actually builds a generic localizable deep representation that exposes the implicit attention of CNNs on an image. Despite the apparent simplicity of global average pooling, we are able to achieve 37.1% top-5 error for object localization on ILSVRC 2014 without training on any bounding box annotation. We demonstrate in a variety of experiments that our network is able to localize the discriminative image regions despite just being trained for solving classification task1.
first_indexed 2024-09-23T17:03:23Z
format Article
id mit-1721.1/112986
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T17:03:23Z
publishDate 2017
publisher Institute of Electrical and Electronics Engineers (IEEE)
record_format dspace
spelling mit-1721.1/1129862022-09-29T23:22:03Z Learning Deep Features for Discriminative Localization Zhou, Bolei Khosla, Aditya Lapedriza Garcia, Agata Oliva, Aude Torralba, Antonio Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology. Media Laboratory Program in Media Arts and Sciences (Massachusetts Institute of Technology) Zhou, Bolei Khosla, Aditya Lapedriza Garcia, Agata Oliva, Aude Torralba, Antonio In this work, we revisit the global average pooling layer proposed in, and shed light on how it explicitly enables the convolutional neural network (CNN) to have remarkable localization ability despite being trained on imagelevel labels. While this technique was previously proposed as a means for regularizing training, we find that it actually builds a generic localizable deep representation that exposes the implicit attention of CNNs on an image. Despite the apparent simplicity of global average pooling, we are able to achieve 37.1% top-5 error for object localization on ILSVRC 2014 without training on any bounding box annotation. We demonstrate in a variety of experiments that our network is able to localize the discriminative image regions despite just being trained for solving classification task1. National Science Foundation (U.S.) (Grant IIS-1524817) Google (Firm) (Faculty Research Award) 2017-12-29T19:29:08Z 2017-12-29T19:29:08Z 2016-12 2016-06 Article http://purl.org/eprint/type/ConferencePaper 978-1-4673-8851-1 http://hdl.handle.net/1721.1/112986 Zhou, Bolei, et al. "Learning Deep Features for Discriminative Localization." 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27-30 June, 2016, Las Vegas, NV, IEEE, 2016, pp. 2921–29. https://orcid.org/0000-0002-3570-4396 https://orcid.org/0000-0002-0007-3352 https://orcid.org/0000-0003-4915-0256 en_US http://dx.doi.org/10.1109/CVPR.2016.319 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Institute of Electrical and Electronics Engineers (IEEE) arXiv
spellingShingle Zhou, Bolei
Khosla, Aditya
Lapedriza Garcia, Agata
Oliva, Aude
Torralba, Antonio
Learning Deep Features for Discriminative Localization
title Learning Deep Features for Discriminative Localization
title_full Learning Deep Features for Discriminative Localization
title_fullStr Learning Deep Features for Discriminative Localization
title_full_unstemmed Learning Deep Features for Discriminative Localization
title_short Learning Deep Features for Discriminative Localization
title_sort learning deep features for discriminative localization
url http://hdl.handle.net/1721.1/112986
https://orcid.org/0000-0002-3570-4396
https://orcid.org/0000-0002-0007-3352
https://orcid.org/0000-0003-4915-0256
work_keys_str_mv AT zhoubolei learningdeepfeaturesfordiscriminativelocalization
AT khoslaaditya learningdeepfeaturesfordiscriminativelocalization
AT lapedrizagarciaagata learningdeepfeaturesfordiscriminativelocalization
AT olivaaude learningdeepfeaturesfordiscriminativelocalization
AT torralbaantonio learningdeepfeaturesfordiscriminativelocalization