Multi-Level Fusion Temporal–Spatial Co-Attention for Video-Based Person Re-Identification

A convolutional neural network can easily fall into local minima for insufficient data, and the needed training is unstable. Many current methods are used to solve these problems by adding pedestrian attributes, pedestrian postures, and other auxiliary information, but they require additional collec...

Full description

Bibliographic Details
Main Authors:	Shengyu Pei, Xiaoping Fan
Format:	Article
Language:	English
Published:	MDPI AG 2021-12-01
Series:	Entropy
Subjects:	video-based person re-identification multi-level fusion temporal–spatial co-attention knowledge evolution
Online Access:	https://www.mdpi.com/1099-4300/23/12/1686

_version_	1797504886986768384
author	Shengyu Pei Xiaoping Fan
author_facet	Shengyu Pei Xiaoping Fan
author_sort	Shengyu Pei
collection	DOAJ
description	A convolutional neural network can easily fall into local minima for insufficient data, and the needed training is unstable. Many current methods are used to solve these problems by adding pedestrian attributes, pedestrian postures, and other auxiliary information, but they require additional collection, which is time-consuming and laborious. Every video sequence frame has a different degree of similarity. In this paper, multi-level fusion temporal–spatial co-attention is adopted to improve person re-identification (reID). For a small dataset, the improved network can better prevent over-fitting and reduce the dataset limit. Specifically, the concept of knowledge evolution is introduced into video-based person re-identification to improve the backbone residual neural network (ResNet). The global branch, local branch, and attention branch are used in parallel for feature extraction. Three high-level features are embedded in the metric learning network to improve the network’s generalization ability and the accuracy of video-based person re-identification. Simulation experiments are implemented on small datasets PRID2011 and iLIDS-VID, and the improved network can better prevent over-fitting. Experiments are also implemented on MARS and DukeMTMC-VideoReID, and the proposed method can be used to extract more feature information and improve the network’s generalization ability. The results show that our method achieves better performance. The model achieves 90.15% Rank1 and 81.91% mAP on MARS.
first_indexed	2024-03-10T04:10:47Z
format	Article
id	doaj.art-1044c09cd5a44d0b9841cba1f8946eea
institution	Directory Open Access Journal
issn	1099-4300
language	English
last_indexed	2024-03-10T04:10:47Z
publishDate	2021-12-01
publisher	MDPI AG
record_format	Article
series	Entropy
spelling	doaj.art-1044c09cd5a44d0b9841cba1f8946eea2023-11-23T08:11:41ZengMDPI AGEntropy1099-43002021-12-012312168610.3390/e23121686Multi-Level Fusion Temporal–Spatial Co-Attention for Video-Based Person Re-IdentificationShengyu Pei0Xiaoping Fan1School of Automation, Central South University, Changsha 410075, ChinaSchool of Automation, Central South University, Changsha 410075, ChinaA convolutional neural network can easily fall into local minima for insufficient data, and the needed training is unstable. Many current methods are used to solve these problems by adding pedestrian attributes, pedestrian postures, and other auxiliary information, but they require additional collection, which is time-consuming and laborious. Every video sequence frame has a different degree of similarity. In this paper, multi-level fusion temporal–spatial co-attention is adopted to improve person re-identification (reID). For a small dataset, the improved network can better prevent over-fitting and reduce the dataset limit. Specifically, the concept of knowledge evolution is introduced into video-based person re-identification to improve the backbone residual neural network (ResNet). The global branch, local branch, and attention branch are used in parallel for feature extraction. Three high-level features are embedded in the metric learning network to improve the network’s generalization ability and the accuracy of video-based person re-identification. Simulation experiments are implemented on small datasets PRID2011 and iLIDS-VID, and the improved network can better prevent over-fitting. Experiments are also implemented on MARS and DukeMTMC-VideoReID, and the proposed method can be used to extract more feature information and improve the network’s generalization ability. The results show that our method achieves better performance. The model achieves 90.15% Rank1 and 81.91% mAP on MARS.https://www.mdpi.com/1099-4300/23/12/1686video-based person re-identificationmulti-level fusiontemporal–spatial co-attentionknowledge evolution
spellingShingle	Shengyu Pei Xiaoping Fan Multi-Level Fusion Temporal–Spatial Co-Attention for Video-Based Person Re-Identification Entropy video-based person re-identification multi-level fusion temporal–spatial co-attention knowledge evolution
title	Multi-Level Fusion Temporal–Spatial Co-Attention for Video-Based Person Re-Identification
title_full	Multi-Level Fusion Temporal–Spatial Co-Attention for Video-Based Person Re-Identification
title_fullStr	Multi-Level Fusion Temporal–Spatial Co-Attention for Video-Based Person Re-Identification
title_full_unstemmed	Multi-Level Fusion Temporal–Spatial Co-Attention for Video-Based Person Re-Identification
title_short	Multi-Level Fusion Temporal–Spatial Co-Attention for Video-Based Person Re-Identification
title_sort	multi level fusion temporal spatial co attention for video based person re identification
topic	video-based person re-identification multi-level fusion temporal–spatial co-attention knowledge evolution
url	https://www.mdpi.com/1099-4300/23/12/1686
work_keys_str_mv	AT shengyupei multilevelfusiontemporalspatialcoattentionforvideobasedpersonreidentification AT xiaopingfan multilevelfusiontemporalspatialcoattentionforvideobasedpersonreidentification

Multi-Level Fusion Temporal–Spatial Co-Attention for Video-Based Person Re-Identification

Similar Items