Localization-Aware Adaptive Pairwise Margin Loss for Fine-Grained Image Recognition

Fine-grained image recognition is a highly challenging problem due to subtle differences between images. There are many attempts to solve fine-grained image recognition problems using data augmentation, jointly optimizing deep metric learning. CutMix is one of the excellent data augmentation strateg...

Full description

Bibliographic Details
Main Authors: Taehung Kim, Hoseong Kim, Hyeran Byun
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9313990/
Description
Summary:Fine-grained image recognition is a highly challenging problem due to subtle differences between images. There are many attempts to solve fine-grained image recognition problems using data augmentation, jointly optimizing deep metric learning. CutMix is one of the excellent data augmentation strategies which crops and merges to generate new images. However, it sometimes generates meaningless and obscured object images that degrade recognition performance. We propose a novel framework that solves the above problem and expands the CutMix leveraging localizing method. Also, we improve the recognition accuracy to joint optimizing with a pairwise margin loss using generated images from the improved CutMix. There are some images similar to the reference image among the generated images. They are generated by replacing similar parts from the reference image. Those generated images should not be located much farther than the margin value in embedding space because those generated images and a reference image have similar semantic meaning. However, the conventional margin loss can not consider those images which are located much farther than the margin. To solve this problem, we propose an additional margin loss to consider those generated images. The proposed framework consists of two stages: the part localization-aware CutMix and an adaptive pairwise margin loss. The proposed method achieves state-of-the-art performance on the CUB-200-2011, FGVC-Aircraft, Stanford Cars, and DeepFashion datasets. Furthermore, extensive experiments demonstrate that each stage improves the final performance.
ISSN:2169-3536