Unsupervised Domain Adaptive Person Re-Identification Method Based on Transformer

Person re-identification (ReID) is the problem of cross-camera target retrieval. The extraction of robust and discriminant features is the key factor in realizing the correct correlation of targets. A model based on convolutional neural networks (CNNs) can extract more robust image features. Still,...

Full description

Bibliographic Details
Main Authors: Xiai Yan, Shengkai Ding, Wei Zhou, Weiqi Shi, Hua Tian
Format: Article
Language:English
Published: MDPI AG 2022-09-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/11/19/3082
_version_ 1797479864767348736
author Xiai Yan
Shengkai Ding
Wei Zhou
Weiqi Shi
Hua Tian
author_facet Xiai Yan
Shengkai Ding
Wei Zhou
Weiqi Shi
Hua Tian
author_sort Xiai Yan
collection DOAJ
description Person re-identification (ReID) is the problem of cross-camera target retrieval. The extraction of robust and discriminant features is the key factor in realizing the correct correlation of targets. A model based on convolutional neural networks (CNNs) can extract more robust image features. Still, it completes the extraction of images from local information to global information by continuously accumulating convolution layers. As a complex CNN, a vision transformer (ViT) captures global information from the beginning to extract more powerful features. This paper proposes an unsupervised domain adaptive person re-identification model (ViTReID) based on the vision transformer, taking the ViT model trained on ImageNet as the pre-training weight and a transformer encoder as the feature extraction network, which makes up for some defects of the CNN model. At the same time, the combined loss function of cross-entropy and triplet loss function combined with the center loss function is used to optimize the network; the person’s head is evaluated and trained as a local feature combined with the global feature of the whole body, focusing on the head, to enhance the head feature information. The experimental results show that ViTReID exceeds the baseline method (SSG) by 14% (Market1501 → MSMT17) in mean average precision (mAP). In MSMT17 → Market1501, ViTReID is 1.2% higher in rank-1 (R1) accuracy than a state-of-the-art method (SPCL); in PersonX → MSMT17, the mAP is 3.1% higher than that of the MMT-dbscan method, and in PersonX → Market1501, the mAP is 1.5% higher than that of the MMT-dbscan method.
first_indexed 2024-03-09T21:51:56Z
format Article
id doaj.art-7b141a3168a94cecab403968df9f46d4
institution Directory Open Access Journal
issn 2079-9292
language English
last_indexed 2024-03-09T21:51:56Z
publishDate 2022-09-01
publisher MDPI AG
record_format Article
series Electronics
spelling doaj.art-7b141a3168a94cecab403968df9f46d42023-11-23T20:06:00ZengMDPI AGElectronics2079-92922022-09-011119308210.3390/electronics11193082Unsupervised Domain Adaptive Person Re-Identification Method Based on TransformerXiai Yan0Shengkai Ding1Wei Zhou2Weiqi Shi3Hua Tian4School of Computer Science and School of Cyberspace Science, Xiangtan University, Xiangtan 411105, ChinaSchool of Computer Science and School of Cyberspace Science, Xiangtan University, Xiangtan 411105, ChinaSchool of Computer Science and School of Cyberspace Science, Xiangtan University, Xiangtan 411105, ChinaDepartment of Information Technology, Hunan Police Academy, Changsha 410138, ChinaDepartment of Information Technology, Hunan Police Academy, Changsha 410138, ChinaPerson re-identification (ReID) is the problem of cross-camera target retrieval. The extraction of robust and discriminant features is the key factor in realizing the correct correlation of targets. A model based on convolutional neural networks (CNNs) can extract more robust image features. Still, it completes the extraction of images from local information to global information by continuously accumulating convolution layers. As a complex CNN, a vision transformer (ViT) captures global information from the beginning to extract more powerful features. This paper proposes an unsupervised domain adaptive person re-identification model (ViTReID) based on the vision transformer, taking the ViT model trained on ImageNet as the pre-training weight and a transformer encoder as the feature extraction network, which makes up for some defects of the CNN model. At the same time, the combined loss function of cross-entropy and triplet loss function combined with the center loss function is used to optimize the network; the person’s head is evaluated and trained as a local feature combined with the global feature of the whole body, focusing on the head, to enhance the head feature information. The experimental results show that ViTReID exceeds the baseline method (SSG) by 14% (Market1501 → MSMT17) in mean average precision (mAP). In MSMT17 → Market1501, ViTReID is 1.2% higher in rank-1 (R1) accuracy than a state-of-the-art method (SPCL); in PersonX → MSMT17, the mAP is 3.1% higher than that of the MMT-dbscan method, and in PersonX → Market1501, the mAP is 1.5% higher than that of the MMT-dbscan method.https://www.mdpi.com/2079-9292/11/19/3082person re-identification (ReID)unsupervised domain adaptivetransformervision transformer
spellingShingle Xiai Yan
Shengkai Ding
Wei Zhou
Weiqi Shi
Hua Tian
Unsupervised Domain Adaptive Person Re-Identification Method Based on Transformer
Electronics
person re-identification (ReID)
unsupervised domain adaptive
transformer
vision transformer
title Unsupervised Domain Adaptive Person Re-Identification Method Based on Transformer
title_full Unsupervised Domain Adaptive Person Re-Identification Method Based on Transformer
title_fullStr Unsupervised Domain Adaptive Person Re-Identification Method Based on Transformer
title_full_unstemmed Unsupervised Domain Adaptive Person Re-Identification Method Based on Transformer
title_short Unsupervised Domain Adaptive Person Re-Identification Method Based on Transformer
title_sort unsupervised domain adaptive person re identification method based on transformer
topic person re-identification (ReID)
unsupervised domain adaptive
transformer
vision transformer
url https://www.mdpi.com/2079-9292/11/19/3082
work_keys_str_mv AT xiaiyan unsuperviseddomainadaptivepersonreidentificationmethodbasedontransformer
AT shengkaiding unsuperviseddomainadaptivepersonreidentificationmethodbasedontransformer
AT weizhou unsuperviseddomainadaptivepersonreidentificationmethodbasedontransformer
AT weiqishi unsuperviseddomainadaptivepersonreidentificationmethodbasedontransformer
AT huatian unsuperviseddomainadaptivepersonreidentificationmethodbasedontransformer