Leveraging attention‐based visual clue extraction for image classification

Abstract Deep learning‐based approaches have made considerable progress in image classification tasks, but most of the approaches lack interpretability, especially in revealing the decisive information causing the categorization of images. This paper seeks to answer the question of what clues encode...

Full description

Bibliographic Details
Main Authors: Yunbo Cui, Youtian Du, Xue Wang, Hang Wang, Chang Su
Format: Article
Language:English
Published: Wiley 2021-10-01
Series:IET Image Processing
Subjects:
Online Access:https://doi.org/10.1049/ipr2.12280
Description
Summary:Abstract Deep learning‐based approaches have made considerable progress in image classification tasks, but most of the approaches lack interpretability, especially in revealing the decisive information causing the categorization of images. This paper seeks to answer the question of what clues encode the discriminative visual information between image categories and can help improve the classification performance. To this end, an attention‐based clue extraction network (ACENet) is introduced to mine the decisive local visual information for image classification. ACENet constructs a clue‐attention mechanism, that is global‐local attention, between the image and visual clue proposals extracted from it and then introduces a contrastive loss defined over the achieved discrete attention distribution to increase the discriminability of clue proposals. The loss encourages considerable attention to be devoted to discriminative clue proposals, that is those similar within the same category and dissimilar across categories. The experimental results for the Negative Web Image (NWI) dataset and the public ImageNet2012 dataset demonstrate that ACENet can extract true clues to improve the image classification performance and outperforms the baselines and the state‐of‐the‐art methods.
ISSN:1751-9659
1751-9667