Leveraging attention‐based visual clue extraction for image classification

Abstract Deep learning‐based approaches have made considerable progress in image classification tasks, but most of the approaches lack interpretability, especially in revealing the decisive information causing the categorization of images. This paper seeks to answer the question of what clues encode...

Full description

Bibliographic Details
Main Authors: Yunbo Cui, Youtian Du, Xue Wang, Hang Wang, Chang Su
Format: Article
Language:English
Published: Wiley 2021-10-01
Series:IET Image Processing
Subjects:
Online Access:https://doi.org/10.1049/ipr2.12280
_version_ 1797991544976834560
author Yunbo Cui
Youtian Du
Xue Wang
Hang Wang
Chang Su
author_facet Yunbo Cui
Youtian Du
Xue Wang
Hang Wang
Chang Su
author_sort Yunbo Cui
collection DOAJ
description Abstract Deep learning‐based approaches have made considerable progress in image classification tasks, but most of the approaches lack interpretability, especially in revealing the decisive information causing the categorization of images. This paper seeks to answer the question of what clues encode the discriminative visual information between image categories and can help improve the classification performance. To this end, an attention‐based clue extraction network (ACENet) is introduced to mine the decisive local visual information for image classification. ACENet constructs a clue‐attention mechanism, that is global‐local attention, between the image and visual clue proposals extracted from it and then introduces a contrastive loss defined over the achieved discrete attention distribution to increase the discriminability of clue proposals. The loss encourages considerable attention to be devoted to discriminative clue proposals, that is those similar within the same category and dissimilar across categories. The experimental results for the Negative Web Image (NWI) dataset and the public ImageNet2012 dataset demonstrate that ACENet can extract true clues to improve the image classification performance and outperforms the baselines and the state‐of‐the‐art methods.
first_indexed 2024-04-11T08:53:53Z
format Article
id doaj.art-62473bcffd9f4e419d7a2b11e3ebce78
institution Directory Open Access Journal
issn 1751-9659
1751-9667
language English
last_indexed 2024-04-11T08:53:53Z
publishDate 2021-10-01
publisher Wiley
record_format Article
series IET Image Processing
spelling doaj.art-62473bcffd9f4e419d7a2b11e3ebce782022-12-22T04:33:21ZengWileyIET Image Processing1751-96591751-96672021-10-0115122937294710.1049/ipr2.12280Leveraging attention‐based visual clue extraction for image classificationYunbo Cui0Youtian Du1Xue Wang2Hang Wang3Chang Su4Faculty of Electronic and Information Engineering Xi'an Jiaotong University No. 28, Xianning West Road Xi'an ChinaFaculty of Electronic and Information Engineering Xi'an Jiaotong University No. 28, Xianning West Road Xi'an ChinaFaculty of Electronic and Information Engineering Xi'an Jiaotong University No. 28, Xianning West Road Xi'an ChinaFaculty of Electronic and Information Engineering Xi'an Jiaotong University No. 28, Xianning West Road Xi'an ChinaDepartment of Healthcare Policy and Research at Weill Cornell Medicine Cornell University Ithaca, New York USAAbstract Deep learning‐based approaches have made considerable progress in image classification tasks, but most of the approaches lack interpretability, especially in revealing the decisive information causing the categorization of images. This paper seeks to answer the question of what clues encode the discriminative visual information between image categories and can help improve the classification performance. To this end, an attention‐based clue extraction network (ACENet) is introduced to mine the decisive local visual information for image classification. ACENet constructs a clue‐attention mechanism, that is global‐local attention, between the image and visual clue proposals extracted from it and then introduces a contrastive loss defined over the achieved discrete attention distribution to increase the discriminability of clue proposals. The loss encourages considerable attention to be devoted to discriminative clue proposals, that is those similar within the same category and dissimilar across categories. The experimental results for the Negative Web Image (NWI) dataset and the public ImageNet2012 dataset demonstrate that ACENet can extract true clues to improve the image classification performance and outperforms the baselines and the state‐of‐the‐art methods.https://doi.org/10.1049/ipr2.12280Image recognitionComputer vision and image processing techniquesData miningNeural nets
spellingShingle Yunbo Cui
Youtian Du
Xue Wang
Hang Wang
Chang Su
Leveraging attention‐based visual clue extraction for image classification
IET Image Processing
Image recognition
Computer vision and image processing techniques
Data mining
Neural nets
title Leveraging attention‐based visual clue extraction for image classification
title_full Leveraging attention‐based visual clue extraction for image classification
title_fullStr Leveraging attention‐based visual clue extraction for image classification
title_full_unstemmed Leveraging attention‐based visual clue extraction for image classification
title_short Leveraging attention‐based visual clue extraction for image classification
title_sort leveraging attention based visual clue extraction for image classification
topic Image recognition
Computer vision and image processing techniques
Data mining
Neural nets
url https://doi.org/10.1049/ipr2.12280
work_keys_str_mv AT yunbocui leveragingattentionbasedvisualclueextractionforimageclassification
AT youtiandu leveragingattentionbasedvisualclueextractionforimageclassification
AT xuewang leveragingattentionbasedvisualclueextractionforimageclassification
AT hangwang leveragingattentionbasedvisualclueextractionforimageclassification
AT changsu leveragingattentionbasedvisualclueextractionforimageclassification