Person Re-Identification Based on Two-Stream Network With Attention and Pose Features

Due to posture, blurring, occlusion, and other problems, person re-identification(Re-ID) remains a challenging task at present. In this paper, we combine the advantages of pose estimation and attention mechanism to better solve these problems with better performance, which combines pose and attentio...

Full description

Bibliographic Details
Main Authors: Xiaowei Gong, Suguo Zhu
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8795487/
Description
Summary:Due to posture, blurring, occlusion, and other problems, person re-identification(Re-ID) remains a challenging task at present. In this paper, we combine the advantages of pose estimation and attention mechanism to better solve these problems with better performance, which combines pose and attention with two-stream network. Our proposed method mainly consists of two parts. 1) Spatial Features with Fusion Multi-Layer Features and Attention: the same pedestrian presents different gestures under different camera angles, indicating that the simple spatial information is no longer reliable. Therefore, it becomes important to distinguish view invariant features from multiple semantic levels. As a consequence, we fusion the mid-level and high-level features, and then correlate global information through self-attention. Due to fusion the mid-level and high-level features, semantic information is more abundant, which enables the attention mechanism to better focus on the important areas of the picture; 2) Aggregation Attention Stream and Pose Estimation Stream Features: although self-attention mechanism can automatically pay attention to the important areas of the image, it may pay too much focus on the prominent parts of the body and ignore the edge information of the body. Hence, the guidance of pedestrian posture is needed to make self-attention better able to pay attention to all parts of the body. Finally, we use bilinear pooling aggregates the features of two-stream as the final features. We do not use any data enhancement and re-ranking methods to achieve the $rank=1$ accuracy of 93.3% and 85.5% in Market1501 and DukeMTMC-reID datasets, respectively, which indicates the effectiveness of our method.
ISSN:2169-3536