Learning Deep Features for Discriminative Localization
In this work, we revisit the global average pooling layer proposed in, and shed light on how it explicitly enables the convolutional neural network (CNN) to have remarkable localization ability despite being trained on imagelevel labels. While this technique was previously proposed as a means for re...
Main Authors: | , , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | en_US |
Published: |
Institute of Electrical and Electronics Engineers (IEEE)
2017
|
Online Access: | http://hdl.handle.net/1721.1/112986 https://orcid.org/0000-0002-3570-4396 https://orcid.org/0000-0002-0007-3352 https://orcid.org/0000-0003-4915-0256 |
_version_ | 1826217421463093248 |
---|---|
author | Zhou, Bolei Khosla, Aditya Lapedriza Garcia, Agata Oliva, Aude Torralba, Antonio |
author2 | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory |
author_facet | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Zhou, Bolei Khosla, Aditya Lapedriza Garcia, Agata Oliva, Aude Torralba, Antonio |
author_sort | Zhou, Bolei |
collection | MIT |
description | In this work, we revisit the global average pooling layer proposed in, and shed light on how it explicitly enables the convolutional neural network (CNN) to have remarkable localization ability despite being trained on imagelevel labels. While this technique was previously proposed as a means for regularizing training, we find that it actually builds a generic localizable deep representation that exposes the implicit attention of CNNs on an image. Despite the apparent simplicity of global average pooling, we are able to achieve 37.1% top-5 error for object localization on ILSVRC 2014 without training on any bounding box annotation. We demonstrate in a variety of experiments that our network is able to localize the discriminative image regions despite just being trained for solving classification task1. |
first_indexed | 2024-09-23T17:03:23Z |
format | Article |
id | mit-1721.1/112986 |
institution | Massachusetts Institute of Technology |
language | en_US |
last_indexed | 2024-09-23T17:03:23Z |
publishDate | 2017 |
publisher | Institute of Electrical and Electronics Engineers (IEEE) |
record_format | dspace |
spelling | mit-1721.1/1129862022-09-29T23:22:03Z Learning Deep Features for Discriminative Localization Zhou, Bolei Khosla, Aditya Lapedriza Garcia, Agata Oliva, Aude Torralba, Antonio Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology. Media Laboratory Program in Media Arts and Sciences (Massachusetts Institute of Technology) Zhou, Bolei Khosla, Aditya Lapedriza Garcia, Agata Oliva, Aude Torralba, Antonio In this work, we revisit the global average pooling layer proposed in, and shed light on how it explicitly enables the convolutional neural network (CNN) to have remarkable localization ability despite being trained on imagelevel labels. While this technique was previously proposed as a means for regularizing training, we find that it actually builds a generic localizable deep representation that exposes the implicit attention of CNNs on an image. Despite the apparent simplicity of global average pooling, we are able to achieve 37.1% top-5 error for object localization on ILSVRC 2014 without training on any bounding box annotation. We demonstrate in a variety of experiments that our network is able to localize the discriminative image regions despite just being trained for solving classification task1. National Science Foundation (U.S.) (Grant IIS-1524817) Google (Firm) (Faculty Research Award) 2017-12-29T19:29:08Z 2017-12-29T19:29:08Z 2016-12 2016-06 Article http://purl.org/eprint/type/ConferencePaper 978-1-4673-8851-1 http://hdl.handle.net/1721.1/112986 Zhou, Bolei, et al. "Learning Deep Features for Discriminative Localization." 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27-30 June, 2016, Las Vegas, NV, IEEE, 2016, pp. 2921–29. https://orcid.org/0000-0002-3570-4396 https://orcid.org/0000-0002-0007-3352 https://orcid.org/0000-0003-4915-0256 en_US http://dx.doi.org/10.1109/CVPR.2016.319 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Institute of Electrical and Electronics Engineers (IEEE) arXiv |
spellingShingle | Zhou, Bolei Khosla, Aditya Lapedriza Garcia, Agata Oliva, Aude Torralba, Antonio Learning Deep Features for Discriminative Localization |
title | Learning Deep Features for Discriminative Localization |
title_full | Learning Deep Features for Discriminative Localization |
title_fullStr | Learning Deep Features for Discriminative Localization |
title_full_unstemmed | Learning Deep Features for Discriminative Localization |
title_short | Learning Deep Features for Discriminative Localization |
title_sort | learning deep features for discriminative localization |
url | http://hdl.handle.net/1721.1/112986 https://orcid.org/0000-0002-3570-4396 https://orcid.org/0000-0002-0007-3352 https://orcid.org/0000-0003-4915-0256 |
work_keys_str_mv | AT zhoubolei learningdeepfeaturesfordiscriminativelocalization AT khoslaaditya learningdeepfeaturesfordiscriminativelocalization AT lapedrizagarciaagata learningdeepfeaturesfordiscriminativelocalization AT olivaaude learningdeepfeaturesfordiscriminativelocalization AT torralbaantonio learningdeepfeaturesfordiscriminativelocalization |