Attribute‐guided transformer for robust person re‐identification

Abstract Recent studies reveal the crucial role of local features in learning robust and discriminative representations for person re‐identification (Re‐ID). Existing approaches typically rely on external tasks, for example, semantic segmentation, or pose estimation, to locate identifiable parts of...

Full description

Bibliographic Details
Main Authors: Zhe Wang, Jun Wang, Junliang Xing
Format: Article
Language:English
Published: Wiley 2023-12-01
Series:IET Computer Vision
Subjects:
Online Access:https://doi.org/10.1049/cvi2.12215
_version_ 1797388016059154432
author Zhe Wang
Jun Wang
Junliang Xing
author_facet Zhe Wang
Jun Wang
Junliang Xing
author_sort Zhe Wang
collection DOAJ
description Abstract Recent studies reveal the crucial role of local features in learning robust and discriminative representations for person re‐identification (Re‐ID). Existing approaches typically rely on external tasks, for example, semantic segmentation, or pose estimation, to locate identifiable parts of given images. However, they heuristically utilise the predictions from off‐the‐shelf models, which may be sub‐optimal in terms of both local partition and computational efficiency. They also ignore the mutual information with other inputs, which weakens the representation capabilities of local features. In this study, the authors put forward a novel Attribute‐guided Transformer (AiT), which explicitly exploits pedestrian attributes as semantic priors for discriminative representation learning. Specifically, the authors first introduce an attribute learning process, which generates a set of attention maps highlighting the informative parts of pedestrian images. Then, the authors design a Feature Diffusion Module (FDM) to iteratively inject attribute information into global feature maps, aiming at suppressing unnecessary noise and inferring attribute‐aware representations. Last, the authors propose a Feature Aggregation Module (FAM) to exploit mutual information for aggregating attribute characteristics from different images, enhancing the representation capabilities of feature embedding. Extensive experiments demonstrate the superiority of our AiT in learning robust and discriminative representations. As a result, the authors achieve competitive performance with state‐of‐the‐art methods on several challenging benchmarks without any bells and whistles.
first_indexed 2024-03-08T22:33:41Z
format Article
id doaj.art-c0f1d1f9a26948d29611c4ea4ebf6b94
institution Directory Open Access Journal
issn 1751-9632
1751-9640
language English
last_indexed 2024-03-08T22:33:41Z
publishDate 2023-12-01
publisher Wiley
record_format Article
series IET Computer Vision
spelling doaj.art-c0f1d1f9a26948d29611c4ea4ebf6b942023-12-17T15:35:00ZengWileyIET Computer Vision1751-96321751-96402023-12-0117897799210.1049/cvi2.12215Attribute‐guided transformer for robust person re‐identificationZhe Wang0Jun Wang1Junliang Xing2School of Electronics and Information Engineering Beihang University Beijing ChinaSchool of Electronics and Information Engineering Beihang University Beijing ChinaDepartment of Computer Science and Technology Tsinghua University Beijing ChinaAbstract Recent studies reveal the crucial role of local features in learning robust and discriminative representations for person re‐identification (Re‐ID). Existing approaches typically rely on external tasks, for example, semantic segmentation, or pose estimation, to locate identifiable parts of given images. However, they heuristically utilise the predictions from off‐the‐shelf models, which may be sub‐optimal in terms of both local partition and computational efficiency. They also ignore the mutual information with other inputs, which weakens the representation capabilities of local features. In this study, the authors put forward a novel Attribute‐guided Transformer (AiT), which explicitly exploits pedestrian attributes as semantic priors for discriminative representation learning. Specifically, the authors first introduce an attribute learning process, which generates a set of attention maps highlighting the informative parts of pedestrian images. Then, the authors design a Feature Diffusion Module (FDM) to iteratively inject attribute information into global feature maps, aiming at suppressing unnecessary noise and inferring attribute‐aware representations. Last, the authors propose a Feature Aggregation Module (FAM) to exploit mutual information for aggregating attribute characteristics from different images, enhancing the representation capabilities of feature embedding. Extensive experiments demonstrate the superiority of our AiT in learning robust and discriminative representations. As a result, the authors achieve competitive performance with state‐of‐the‐art methods on several challenging benchmarks without any bells and whistles.https://doi.org/10.1049/cvi2.12215computer visionobject recognition
spellingShingle Zhe Wang
Jun Wang
Junliang Xing
Attribute‐guided transformer for robust person re‐identification
IET Computer Vision
computer vision
object recognition
title Attribute‐guided transformer for robust person re‐identification
title_full Attribute‐guided transformer for robust person re‐identification
title_fullStr Attribute‐guided transformer for robust person re‐identification
title_full_unstemmed Attribute‐guided transformer for robust person re‐identification
title_short Attribute‐guided transformer for robust person re‐identification
title_sort attribute guided transformer for robust person re identification
topic computer vision
object recognition
url https://doi.org/10.1049/cvi2.12215
work_keys_str_mv AT zhewang attributeguidedtransformerforrobustpersonreidentification
AT junwang attributeguidedtransformerforrobustpersonreidentification
AT junliangxing attributeguidedtransformerforrobustpersonreidentification