Non-Local Spatial and Temporal Attention Network for Video-Based Person Re-Identification

Given a video containing a person, the video-based person re-identification (Re-ID) task aims to identify the same person from videos captured under different cameras. How to embed spatial-temporal information of a video into its feature representation is a crucial challenge. Most existing methods h...

Full description

Bibliographic Details
Main Authors:	Zheng Liu, Feixiang Du, Wang Li, Xu Liu, Qiang Zou
Format:	Article
Language:	English
Published:	MDPI AG 2020-08-01
Series:	Applied Sciences
Subjects:	person Re-ID video non-local spatial-temporal attention
Online Access:	https://www.mdpi.com/2076-3417/10/15/5385

Description
Summary:	Given a video containing a person, the video-based person re-identification (Re-ID) task aims to identify the same person from videos captured under different cameras. How to embed spatial-temporal information of a video into its feature representation is a crucial challenge. Most existing methods have failed to make full use of the relationship between frames during feature extraction. In this work, we propose a plug-and-play non-local attention module (NLAM) for frame-level feature extraction. NLAM, based on global spatial attention and channel attention, helps the network to determine the location of the person in each frame. Besides, we propose a non-local temporal pooling (NLTP) method used for temporal features’ aggregation, which can effectively capture long-range and global dependencies among the frames of the video. Our model obtained impressive results on different datasets compared to the state-of-the-art methods. In particular, it achieved the rank-1 accuracy of 86.3% on the MARS (Motion Analysis and Re-identification Set) dataset without re-ranking, which is 1.4% higher than the state-of-the-art way. On the DukeMTMC-VideoReID (Duke Multi-Target Multi-Camera Video Reidentification) dataset, our method also had an excellent performance of 95% rank-1 accuracy and 94.5% mAP (mean Average Precision).
ISSN:	2076-3417

Non-Local Spatial and Temporal Attention Network for Video-Based Person Re-Identification

Similar Items