Learning Deep Features for Discriminative Localization

In this work, we revisit the global average pooling layer proposed in, and shed light on how it explicitly enables the convolutional neural network (CNN) to have remarkable localization ability despite being trained on imagelevel labels. While this technique was previously proposed as a means for re...

Full description

Bibliographic Details
Main Authors:	Zhou, Bolei, Khosla, Aditya, Lapedriza Garcia, Agata, Oliva, Aude, Torralba, Antonio
Other Authors:	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format:	Article
Language:	en_US
Published:	Institute of Electrical and Electronics Engineers (IEEE) 2017
Online Access:	http://hdl.handle.net/1721.1/112986 https://orcid.org/0000-0002-3570-4396 https://orcid.org/0000-0002-0007-3352 https://orcid.org/0000-0003-4915-0256

_version_	1826217421463093248
author	Zhou, Bolei Khosla, Aditya Lapedriza Garcia, Agata Oliva, Aude Torralba, Antonio
author2	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Zhou, Bolei Khosla, Aditya Lapedriza Garcia, Agata Oliva, Aude Torralba, Antonio
author_sort	Zhou, Bolei
collection	MIT
description	In this work, we revisit the global average pooling layer proposed in, and shed light on how it explicitly enables the convolutional neural network (CNN) to have remarkable localization ability despite being trained on imagelevel labels. While this technique was previously proposed as a means for regularizing training, we find that it actually builds a generic localizable deep representation that exposes the implicit attention of CNNs on an image. Despite the apparent simplicity of global average pooling, we are able to achieve 37.1% top-5 error for object localization on ILSVRC 2014 without training on any bounding box annotation. We demonstrate in a variety of experiments that our network is able to localize the discriminative image regions despite just being trained for solving classification task1.
first_indexed	2024-09-23T17:03:23Z
format	Article
id	mit-1721.1/112986
institution	Massachusetts Institute of Technology
language	en_US
last_indexed	2024-09-23T17:03:23Z
publishDate	2017
publisher	Institute of Electrical and Electronics Engineers (IEEE)
record_format	dspace
spelling	mit-1721.1/1129862022-09-29T23:22:03Z Learning Deep Features for Discriminative Localization Zhou, Bolei Khosla, Aditya Lapedriza Garcia, Agata Oliva, Aude Torralba, Antonio Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology. Media Laboratory Program in Media Arts and Sciences (Massachusetts Institute of Technology) Zhou, Bolei Khosla, Aditya Lapedriza Garcia, Agata Oliva, Aude Torralba, Antonio In this work, we revisit the global average pooling layer proposed in, and shed light on how it explicitly enables the convolutional neural network (CNN) to have remarkable localization ability despite being trained on imagelevel labels. While this technique was previously proposed as a means for regularizing training, we find that it actually builds a generic localizable deep representation that exposes the implicit attention of CNNs on an image. Despite the apparent simplicity of global average pooling, we are able to achieve 37.1% top-5 error for object localization on ILSVRC 2014 without training on any bounding box annotation. We demonstrate in a variety of experiments that our network is able to localize the discriminative image regions despite just being trained for solving classification task1. National Science Foundation (U.S.) (Grant IIS-1524817) Google (Firm) (Faculty Research Award) 2017-12-29T19:29:08Z 2017-12-29T19:29:08Z 2016-12 2016-06 Article http://purl.org/eprint/type/ConferencePaper 978-1-4673-8851-1 http://hdl.handle.net/1721.1/112986 Zhou, Bolei, et al. "Learning Deep Features for Discriminative Localization." 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27-30 June, 2016, Las Vegas, NV, IEEE, 2016, pp. 2921–29. https://orcid.org/0000-0002-3570-4396 https://orcid.org/0000-0002-0007-3352 https://orcid.org/0000-0003-4915-0256 en_US http://dx.doi.org/10.1109/CVPR.2016.319 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Institute of Electrical and Electronics Engineers (IEEE) arXiv
spellingShingle	Zhou, Bolei Khosla, Aditya Lapedriza Garcia, Agata Oliva, Aude Torralba, Antonio Learning Deep Features for Discriminative Localization
title	Learning Deep Features for Discriminative Localization
title_full	Learning Deep Features for Discriminative Localization
title_fullStr	Learning Deep Features for Discriminative Localization
title_full_unstemmed	Learning Deep Features for Discriminative Localization
title_short	Learning Deep Features for Discriminative Localization
title_sort	learning deep features for discriminative localization
url	http://hdl.handle.net/1721.1/112986 https://orcid.org/0000-0002-3570-4396 https://orcid.org/0000-0002-0007-3352 https://orcid.org/0000-0003-4915-0256
work_keys_str_mv	AT zhoubolei learningdeepfeaturesfordiscriminativelocalization AT khoslaaditya learningdeepfeaturesfordiscriminativelocalization AT lapedrizagarciaagata learningdeepfeaturesfordiscriminativelocalization AT olivaaude learningdeepfeaturesfordiscriminativelocalization AT torralbaantonio learningdeepfeaturesfordiscriminativelocalization

Learning Deep Features for Discriminative Localization

Similar Items