Unsupervised Domain Adaptive Person Re-Identification Method Based on Transformer
Person re-identification (ReID) is the problem of cross-camera target retrieval. The extraction of robust and discriminant features is the key factor in realizing the correct correlation of targets. A model based on convolutional neural networks (CNNs) can extract more robust image features. Still,...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-09-01
|
Series: | Electronics |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-9292/11/19/3082 |
_version_ | 1797479864767348736 |
---|---|
author | Xiai Yan Shengkai Ding Wei Zhou Weiqi Shi Hua Tian |
author_facet | Xiai Yan Shengkai Ding Wei Zhou Weiqi Shi Hua Tian |
author_sort | Xiai Yan |
collection | DOAJ |
description | Person re-identification (ReID) is the problem of cross-camera target retrieval. The extraction of robust and discriminant features is the key factor in realizing the correct correlation of targets. A model based on convolutional neural networks (CNNs) can extract more robust image features. Still, it completes the extraction of images from local information to global information by continuously accumulating convolution layers. As a complex CNN, a vision transformer (ViT) captures global information from the beginning to extract more powerful features. This paper proposes an unsupervised domain adaptive person re-identification model (ViTReID) based on the vision transformer, taking the ViT model trained on ImageNet as the pre-training weight and a transformer encoder as the feature extraction network, which makes up for some defects of the CNN model. At the same time, the combined loss function of cross-entropy and triplet loss function combined with the center loss function is used to optimize the network; the person’s head is evaluated and trained as a local feature combined with the global feature of the whole body, focusing on the head, to enhance the head feature information. The experimental results show that ViTReID exceeds the baseline method (SSG) by 14% (Market1501 → MSMT17) in mean average precision (mAP). In MSMT17 → Market1501, ViTReID is 1.2% higher in rank-1 (R1) accuracy than a state-of-the-art method (SPCL); in PersonX → MSMT17, the mAP is 3.1% higher than that of the MMT-dbscan method, and in PersonX → Market1501, the mAP is 1.5% higher than that of the MMT-dbscan method. |
first_indexed | 2024-03-09T21:51:56Z |
format | Article |
id | doaj.art-7b141a3168a94cecab403968df9f46d4 |
institution | Directory Open Access Journal |
issn | 2079-9292 |
language | English |
last_indexed | 2024-03-09T21:51:56Z |
publishDate | 2022-09-01 |
publisher | MDPI AG |
record_format | Article |
series | Electronics |
spelling | doaj.art-7b141a3168a94cecab403968df9f46d42023-11-23T20:06:00ZengMDPI AGElectronics2079-92922022-09-011119308210.3390/electronics11193082Unsupervised Domain Adaptive Person Re-Identification Method Based on TransformerXiai Yan0Shengkai Ding1Wei Zhou2Weiqi Shi3Hua Tian4School of Computer Science and School of Cyberspace Science, Xiangtan University, Xiangtan 411105, ChinaSchool of Computer Science and School of Cyberspace Science, Xiangtan University, Xiangtan 411105, ChinaSchool of Computer Science and School of Cyberspace Science, Xiangtan University, Xiangtan 411105, ChinaDepartment of Information Technology, Hunan Police Academy, Changsha 410138, ChinaDepartment of Information Technology, Hunan Police Academy, Changsha 410138, ChinaPerson re-identification (ReID) is the problem of cross-camera target retrieval. The extraction of robust and discriminant features is the key factor in realizing the correct correlation of targets. A model based on convolutional neural networks (CNNs) can extract more robust image features. Still, it completes the extraction of images from local information to global information by continuously accumulating convolution layers. As a complex CNN, a vision transformer (ViT) captures global information from the beginning to extract more powerful features. This paper proposes an unsupervised domain adaptive person re-identification model (ViTReID) based on the vision transformer, taking the ViT model trained on ImageNet as the pre-training weight and a transformer encoder as the feature extraction network, which makes up for some defects of the CNN model. At the same time, the combined loss function of cross-entropy and triplet loss function combined with the center loss function is used to optimize the network; the person’s head is evaluated and trained as a local feature combined with the global feature of the whole body, focusing on the head, to enhance the head feature information. The experimental results show that ViTReID exceeds the baseline method (SSG) by 14% (Market1501 → MSMT17) in mean average precision (mAP). In MSMT17 → Market1501, ViTReID is 1.2% higher in rank-1 (R1) accuracy than a state-of-the-art method (SPCL); in PersonX → MSMT17, the mAP is 3.1% higher than that of the MMT-dbscan method, and in PersonX → Market1501, the mAP is 1.5% higher than that of the MMT-dbscan method.https://www.mdpi.com/2079-9292/11/19/3082person re-identification (ReID)unsupervised domain adaptivetransformervision transformer |
spellingShingle | Xiai Yan Shengkai Ding Wei Zhou Weiqi Shi Hua Tian Unsupervised Domain Adaptive Person Re-Identification Method Based on Transformer Electronics person re-identification (ReID) unsupervised domain adaptive transformer vision transformer |
title | Unsupervised Domain Adaptive Person Re-Identification Method Based on Transformer |
title_full | Unsupervised Domain Adaptive Person Re-Identification Method Based on Transformer |
title_fullStr | Unsupervised Domain Adaptive Person Re-Identification Method Based on Transformer |
title_full_unstemmed | Unsupervised Domain Adaptive Person Re-Identification Method Based on Transformer |
title_short | Unsupervised Domain Adaptive Person Re-Identification Method Based on Transformer |
title_sort | unsupervised domain adaptive person re identification method based on transformer |
topic | person re-identification (ReID) unsupervised domain adaptive transformer vision transformer |
url | https://www.mdpi.com/2079-9292/11/19/3082 |
work_keys_str_mv | AT xiaiyan unsuperviseddomainadaptivepersonreidentificationmethodbasedontransformer AT shengkaiding unsuperviseddomainadaptivepersonreidentificationmethodbasedontransformer AT weizhou unsuperviseddomainadaptivepersonreidentificationmethodbasedontransformer AT weiqishi unsuperviseddomainadaptivepersonreidentificationmethodbasedontransformer AT huatian unsuperviseddomainadaptivepersonreidentificationmethodbasedontransformer |