Leveraging attention‐based visual clue extraction for image classification
Abstract Deep learning‐based approaches have made considerable progress in image classification tasks, but most of the approaches lack interpretability, especially in revealing the decisive information causing the categorization of images. This paper seeks to answer the question of what clues encode...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2021-10-01
|
Series: | IET Image Processing |
Subjects: | |
Online Access: | https://doi.org/10.1049/ipr2.12280 |
_version_ | 1797991544976834560 |
---|---|
author | Yunbo Cui Youtian Du Xue Wang Hang Wang Chang Su |
author_facet | Yunbo Cui Youtian Du Xue Wang Hang Wang Chang Su |
author_sort | Yunbo Cui |
collection | DOAJ |
description | Abstract Deep learning‐based approaches have made considerable progress in image classification tasks, but most of the approaches lack interpretability, especially in revealing the decisive information causing the categorization of images. This paper seeks to answer the question of what clues encode the discriminative visual information between image categories and can help improve the classification performance. To this end, an attention‐based clue extraction network (ACENet) is introduced to mine the decisive local visual information for image classification. ACENet constructs a clue‐attention mechanism, that is global‐local attention, between the image and visual clue proposals extracted from it and then introduces a contrastive loss defined over the achieved discrete attention distribution to increase the discriminability of clue proposals. The loss encourages considerable attention to be devoted to discriminative clue proposals, that is those similar within the same category and dissimilar across categories. The experimental results for the Negative Web Image (NWI) dataset and the public ImageNet2012 dataset demonstrate that ACENet can extract true clues to improve the image classification performance and outperforms the baselines and the state‐of‐the‐art methods. |
first_indexed | 2024-04-11T08:53:53Z |
format | Article |
id | doaj.art-62473bcffd9f4e419d7a2b11e3ebce78 |
institution | Directory Open Access Journal |
issn | 1751-9659 1751-9667 |
language | English |
last_indexed | 2024-04-11T08:53:53Z |
publishDate | 2021-10-01 |
publisher | Wiley |
record_format | Article |
series | IET Image Processing |
spelling | doaj.art-62473bcffd9f4e419d7a2b11e3ebce782022-12-22T04:33:21ZengWileyIET Image Processing1751-96591751-96672021-10-0115122937294710.1049/ipr2.12280Leveraging attention‐based visual clue extraction for image classificationYunbo Cui0Youtian Du1Xue Wang2Hang Wang3Chang Su4Faculty of Electronic and Information Engineering Xi'an Jiaotong University No. 28, Xianning West Road Xi'an ChinaFaculty of Electronic and Information Engineering Xi'an Jiaotong University No. 28, Xianning West Road Xi'an ChinaFaculty of Electronic and Information Engineering Xi'an Jiaotong University No. 28, Xianning West Road Xi'an ChinaFaculty of Electronic and Information Engineering Xi'an Jiaotong University No. 28, Xianning West Road Xi'an ChinaDepartment of Healthcare Policy and Research at Weill Cornell Medicine Cornell University Ithaca, New York USAAbstract Deep learning‐based approaches have made considerable progress in image classification tasks, but most of the approaches lack interpretability, especially in revealing the decisive information causing the categorization of images. This paper seeks to answer the question of what clues encode the discriminative visual information between image categories and can help improve the classification performance. To this end, an attention‐based clue extraction network (ACENet) is introduced to mine the decisive local visual information for image classification. ACENet constructs a clue‐attention mechanism, that is global‐local attention, between the image and visual clue proposals extracted from it and then introduces a contrastive loss defined over the achieved discrete attention distribution to increase the discriminability of clue proposals. The loss encourages considerable attention to be devoted to discriminative clue proposals, that is those similar within the same category and dissimilar across categories. The experimental results for the Negative Web Image (NWI) dataset and the public ImageNet2012 dataset demonstrate that ACENet can extract true clues to improve the image classification performance and outperforms the baselines and the state‐of‐the‐art methods.https://doi.org/10.1049/ipr2.12280Image recognitionComputer vision and image processing techniquesData miningNeural nets |
spellingShingle | Yunbo Cui Youtian Du Xue Wang Hang Wang Chang Su Leveraging attention‐based visual clue extraction for image classification IET Image Processing Image recognition Computer vision and image processing techniques Data mining Neural nets |
title | Leveraging attention‐based visual clue extraction for image classification |
title_full | Leveraging attention‐based visual clue extraction for image classification |
title_fullStr | Leveraging attention‐based visual clue extraction for image classification |
title_full_unstemmed | Leveraging attention‐based visual clue extraction for image classification |
title_short | Leveraging attention‐based visual clue extraction for image classification |
title_sort | leveraging attention based visual clue extraction for image classification |
topic | Image recognition Computer vision and image processing techniques Data mining Neural nets |
url | https://doi.org/10.1049/ipr2.12280 |
work_keys_str_mv | AT yunbocui leveragingattentionbasedvisualclueextractionforimageclassification AT youtiandu leveragingattentionbasedvisualclueextractionforimageclassification AT xuewang leveragingattentionbasedvisualclueextractionforimageclassification AT hangwang leveragingattentionbasedvisualclueextractionforimageclassification AT changsu leveragingattentionbasedvisualclueextractionforimageclassification |