Attribute‐guided transformer for robust person re‐identification
Abstract Recent studies reveal the crucial role of local features in learning robust and discriminative representations for person re‐identification (Re‐ID). Existing approaches typically rely on external tasks, for example, semantic segmentation, or pose estimation, to locate identifiable parts of...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2023-12-01
|
Series: | IET Computer Vision |
Subjects: | |
Online Access: | https://doi.org/10.1049/cvi2.12215 |
_version_ | 1797388016059154432 |
---|---|
author | Zhe Wang Jun Wang Junliang Xing |
author_facet | Zhe Wang Jun Wang Junliang Xing |
author_sort | Zhe Wang |
collection | DOAJ |
description | Abstract Recent studies reveal the crucial role of local features in learning robust and discriminative representations for person re‐identification (Re‐ID). Existing approaches typically rely on external tasks, for example, semantic segmentation, or pose estimation, to locate identifiable parts of given images. However, they heuristically utilise the predictions from off‐the‐shelf models, which may be sub‐optimal in terms of both local partition and computational efficiency. They also ignore the mutual information with other inputs, which weakens the representation capabilities of local features. In this study, the authors put forward a novel Attribute‐guided Transformer (AiT), which explicitly exploits pedestrian attributes as semantic priors for discriminative representation learning. Specifically, the authors first introduce an attribute learning process, which generates a set of attention maps highlighting the informative parts of pedestrian images. Then, the authors design a Feature Diffusion Module (FDM) to iteratively inject attribute information into global feature maps, aiming at suppressing unnecessary noise and inferring attribute‐aware representations. Last, the authors propose a Feature Aggregation Module (FAM) to exploit mutual information for aggregating attribute characteristics from different images, enhancing the representation capabilities of feature embedding. Extensive experiments demonstrate the superiority of our AiT in learning robust and discriminative representations. As a result, the authors achieve competitive performance with state‐of‐the‐art methods on several challenging benchmarks without any bells and whistles. |
first_indexed | 2024-03-08T22:33:41Z |
format | Article |
id | doaj.art-c0f1d1f9a26948d29611c4ea4ebf6b94 |
institution | Directory Open Access Journal |
issn | 1751-9632 1751-9640 |
language | English |
last_indexed | 2024-03-08T22:33:41Z |
publishDate | 2023-12-01 |
publisher | Wiley |
record_format | Article |
series | IET Computer Vision |
spelling | doaj.art-c0f1d1f9a26948d29611c4ea4ebf6b942023-12-17T15:35:00ZengWileyIET Computer Vision1751-96321751-96402023-12-0117897799210.1049/cvi2.12215Attribute‐guided transformer for robust person re‐identificationZhe Wang0Jun Wang1Junliang Xing2School of Electronics and Information Engineering Beihang University Beijing ChinaSchool of Electronics and Information Engineering Beihang University Beijing ChinaDepartment of Computer Science and Technology Tsinghua University Beijing ChinaAbstract Recent studies reveal the crucial role of local features in learning robust and discriminative representations for person re‐identification (Re‐ID). Existing approaches typically rely on external tasks, for example, semantic segmentation, or pose estimation, to locate identifiable parts of given images. However, they heuristically utilise the predictions from off‐the‐shelf models, which may be sub‐optimal in terms of both local partition and computational efficiency. They also ignore the mutual information with other inputs, which weakens the representation capabilities of local features. In this study, the authors put forward a novel Attribute‐guided Transformer (AiT), which explicitly exploits pedestrian attributes as semantic priors for discriminative representation learning. Specifically, the authors first introduce an attribute learning process, which generates a set of attention maps highlighting the informative parts of pedestrian images. Then, the authors design a Feature Diffusion Module (FDM) to iteratively inject attribute information into global feature maps, aiming at suppressing unnecessary noise and inferring attribute‐aware representations. Last, the authors propose a Feature Aggregation Module (FAM) to exploit mutual information for aggregating attribute characteristics from different images, enhancing the representation capabilities of feature embedding. Extensive experiments demonstrate the superiority of our AiT in learning robust and discriminative representations. As a result, the authors achieve competitive performance with state‐of‐the‐art methods on several challenging benchmarks without any bells and whistles.https://doi.org/10.1049/cvi2.12215computer visionobject recognition |
spellingShingle | Zhe Wang Jun Wang Junliang Xing Attribute‐guided transformer for robust person re‐identification IET Computer Vision computer vision object recognition |
title | Attribute‐guided transformer for robust person re‐identification |
title_full | Attribute‐guided transformer for robust person re‐identification |
title_fullStr | Attribute‐guided transformer for robust person re‐identification |
title_full_unstemmed | Attribute‐guided transformer for robust person re‐identification |
title_short | Attribute‐guided transformer for robust person re‐identification |
title_sort | attribute guided transformer for robust person re identification |
topic | computer vision object recognition |
url | https://doi.org/10.1049/cvi2.12215 |
work_keys_str_mv | AT zhewang attributeguidedtransformerforrobustpersonreidentification AT junwang attributeguidedtransformerforrobustpersonreidentification AT junliangxing attributeguidedtransformerforrobustpersonreidentification |