Attribute‐guided transformer for robust person re‐identification

Abstract Recent studies reveal the crucial role of local features in learning robust and discriminative representations for person re‐identification (Re‐ID). Existing approaches typically rely on external tasks, for example, semantic segmentation, or pose estimation, to locate identifiable parts of...

Full description

Bibliographic Details
Main Authors:	Zhe Wang, Jun Wang, Junliang Xing
Format:	Article
Language:	English
Published:	Wiley 2023-12-01
Series:	IET Computer Vision
Subjects:	computer vision object recognition
Online Access:	https://doi.org/10.1049/cvi2.12215

_version_	1797388016059154432
author	Zhe Wang Jun Wang Junliang Xing
author_facet	Zhe Wang Jun Wang Junliang Xing
author_sort	Zhe Wang
collection	DOAJ
description	Abstract Recent studies reveal the crucial role of local features in learning robust and discriminative representations for person re‐identification (Re‐ID). Existing approaches typically rely on external tasks, for example, semantic segmentation, or pose estimation, to locate identifiable parts of given images. However, they heuristically utilise the predictions from off‐the‐shelf models, which may be sub‐optimal in terms of both local partition and computational efficiency. They also ignore the mutual information with other inputs, which weakens the representation capabilities of local features. In this study, the authors put forward a novel Attribute‐guided Transformer (AiT), which explicitly exploits pedestrian attributes as semantic priors for discriminative representation learning. Specifically, the authors first introduce an attribute learning process, which generates a set of attention maps highlighting the informative parts of pedestrian images. Then, the authors design a Feature Diffusion Module (FDM) to iteratively inject attribute information into global feature maps, aiming at suppressing unnecessary noise and inferring attribute‐aware representations. Last, the authors propose a Feature Aggregation Module (FAM) to exploit mutual information for aggregating attribute characteristics from different images, enhancing the representation capabilities of feature embedding. Extensive experiments demonstrate the superiority of our AiT in learning robust and discriminative representations. As a result, the authors achieve competitive performance with state‐of‐the‐art methods on several challenging benchmarks without any bells and whistles.
first_indexed	2024-03-08T22:33:41Z
format	Article
id	doaj.art-c0f1d1f9a26948d29611c4ea4ebf6b94
institution	Directory Open Access Journal
issn	1751-9632 1751-9640
language	English
last_indexed	2024-03-08T22:33:41Z
publishDate	2023-12-01
publisher	Wiley
record_format	Article
series	IET Computer Vision
spelling	doaj.art-c0f1d1f9a26948d29611c4ea4ebf6b942023-12-17T15:35:00ZengWileyIET Computer Vision1751-96321751-96402023-12-0117897799210.1049/cvi2.12215Attribute‐guided transformer for robust person re‐identificationZhe Wang0Jun Wang1Junliang Xing2School of Electronics and Information Engineering Beihang University Beijing ChinaSchool of Electronics and Information Engineering Beihang University Beijing ChinaDepartment of Computer Science and Technology Tsinghua University Beijing ChinaAbstract Recent studies reveal the crucial role of local features in learning robust and discriminative representations for person re‐identification (Re‐ID). Existing approaches typically rely on external tasks, for example, semantic segmentation, or pose estimation, to locate identifiable parts of given images. However, they heuristically utilise the predictions from off‐the‐shelf models, which may be sub‐optimal in terms of both local partition and computational efficiency. They also ignore the mutual information with other inputs, which weakens the representation capabilities of local features. In this study, the authors put forward a novel Attribute‐guided Transformer (AiT), which explicitly exploits pedestrian attributes as semantic priors for discriminative representation learning. Specifically, the authors first introduce an attribute learning process, which generates a set of attention maps highlighting the informative parts of pedestrian images. Then, the authors design a Feature Diffusion Module (FDM) to iteratively inject attribute information into global feature maps, aiming at suppressing unnecessary noise and inferring attribute‐aware representations. Last, the authors propose a Feature Aggregation Module (FAM) to exploit mutual information for aggregating attribute characteristics from different images, enhancing the representation capabilities of feature embedding. Extensive experiments demonstrate the superiority of our AiT in learning robust and discriminative representations. As a result, the authors achieve competitive performance with state‐of‐the‐art methods on several challenging benchmarks without any bells and whistles.https://doi.org/10.1049/cvi2.12215computer visionobject recognition
spellingShingle	Zhe Wang Jun Wang Junliang Xing Attribute‐guided transformer for robust person re‐identification IET Computer Vision computer vision object recognition
title	Attribute‐guided transformer for robust person re‐identification
title_full	Attribute‐guided transformer for robust person re‐identification
title_fullStr	Attribute‐guided transformer for robust person re‐identification
title_full_unstemmed	Attribute‐guided transformer for robust person re‐identification
title_short	Attribute‐guided transformer for robust person re‐identification
title_sort	attribute guided transformer for robust person re identification
topic	computer vision object recognition
url	https://doi.org/10.1049/cvi2.12215
work_keys_str_mv	AT zhewang attributeguidedtransformerforrobustpersonreidentification AT junwang attributeguidedtransformerforrobustpersonreidentification AT junliangxing attributeguidedtransformerforrobustpersonreidentification

Attribute‐guided transformer for robust person re‐identification

Similar Items