Extended Global–Local Representation Learning for Video Person Re-Identification

Recently, person re-identification has become one of the research hotspots in the field of computer vision and has received extensive attention in the academic community. Inspired by the part-based research of image ReID, this paper presents a novel feature learning and extraction framework for vide...

Full description

Bibliographic Details
Main Authors: Wanru Song, Yahong Wu, Jieying Zheng, Changhong Chen, Feng Liu
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8818112/
_version_ 1818877647967485952
author Wanru Song
Yahong Wu
Jieying Zheng
Changhong Chen
Feng Liu
author_facet Wanru Song
Yahong Wu
Jieying Zheng
Changhong Chen
Feng Liu
author_sort Wanru Song
collection DOAJ
description Recently, person re-identification has become one of the research hotspots in the field of computer vision and has received extensive attention in the academic community. Inspired by the part-based research of image ReID, this paper presents a novel feature learning and extraction framework for video-based person re-identification, namely, the extended global-local representation learning network (E-GLRN). Given a video sequence of a pedestrian, the holistic and local features are simultaneously extracted using the E-GLRN network. Specifically, for the global feature learning, we adopt the channel attention convolutional neural network (CNN) and the bidirectional long short-term memory (Bi-LSTM) networks, which are responsible for introducing a CNN-LSTM module to learn the features of consecutive frames. The local feature learning module relies on the key local information extraction, which is based on the Bi-LSTM networks. In order to obtain the local feature more effectively, our work defines a concept of “the main image group” by selecting three representative frames. The local feature representation of a video is obtained by exploiting the spatial contextual and appearance information of this group. The local and global features extracted in this paper are complementary and further combined into a discriminative and robust feature representation of the video sequence. Extensive experiments are conducted on three video-based ReID datasets, including the iLIDS-VID, PRID2011 and MARS datasets. The experimental results demonstrate that the proposed method outperforms state-of-the-art video-based re-identification approaches.
first_indexed 2024-12-19T14:01:37Z
format Article
id doaj.art-2dcca853fce94e86bc3a8db5f989b9eb
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-19T14:01:37Z
publishDate 2019-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-2dcca853fce94e86bc3a8db5f989b9eb2022-12-21T20:18:26ZengIEEEIEEE Access2169-35362019-01-01712268412269610.1109/ACCESS.2019.29379748818112Extended Global–Local Representation Learning for Video Person Re-IdentificationWanru Song0https://orcid.org/0000-0002-7067-6108Yahong Wu1Jieying Zheng2https://orcid.org/0000-0003-4933-4688Changhong Chen3Feng Liu4Jiangsu Key Laboratory of Image Processing and Image Communications, Nanjing University of Posts and Telecommunications, Nanjing, ChinaJiangsu Key Laboratory of Image Processing and Image Communications, Nanjing University of Posts and Telecommunications, Nanjing, ChinaJiangsu Key Laboratory of Image Processing and Image Communications, Nanjing University of Posts and Telecommunications, Nanjing, ChinaJiangsu Key Laboratory of Image Processing and Image Communications, Nanjing University of Posts and Telecommunications, Nanjing, ChinaJiangsu Key Laboratory of Image Processing and Image Communications, Nanjing University of Posts and Telecommunications, Nanjing, ChinaRecently, person re-identification has become one of the research hotspots in the field of computer vision and has received extensive attention in the academic community. Inspired by the part-based research of image ReID, this paper presents a novel feature learning and extraction framework for video-based person re-identification, namely, the extended global-local representation learning network (E-GLRN). Given a video sequence of a pedestrian, the holistic and local features are simultaneously extracted using the E-GLRN network. Specifically, for the global feature learning, we adopt the channel attention convolutional neural network (CNN) and the bidirectional long short-term memory (Bi-LSTM) networks, which are responsible for introducing a CNN-LSTM module to learn the features of consecutive frames. The local feature learning module relies on the key local information extraction, which is based on the Bi-LSTM networks. In order to obtain the local feature more effectively, our work defines a concept of “the main image group” by selecting three representative frames. The local feature representation of a video is obtained by exploiting the spatial contextual and appearance information of this group. The local and global features extracted in this paper are complementary and further combined into a discriminative and robust feature representation of the video sequence. Extensive experiments are conducted on three video-based ReID datasets, including the iLIDS-VID, PRID2011 and MARS datasets. The experimental results demonstrate that the proposed method outperforms state-of-the-art video-based re-identification approaches.https://ieeexplore.ieee.org/document/8818112/Bi-directional LSTMfeature extractionglobal-local feature representationperson re-identificationvideo
spellingShingle Wanru Song
Yahong Wu
Jieying Zheng
Changhong Chen
Feng Liu
Extended Global–Local Representation Learning for Video Person Re-Identification
IEEE Access
Bi-directional LSTM
feature extraction
global-local feature representation
person re-identification
video
title Extended Global–Local Representation Learning for Video Person Re-Identification
title_full Extended Global–Local Representation Learning for Video Person Re-Identification
title_fullStr Extended Global–Local Representation Learning for Video Person Re-Identification
title_full_unstemmed Extended Global–Local Representation Learning for Video Person Re-Identification
title_short Extended Global–Local Representation Learning for Video Person Re-Identification
title_sort extended global x2013 local representation learning for video person re identification
topic Bi-directional LSTM
feature extraction
global-local feature representation
person re-identification
video
url https://ieeexplore.ieee.org/document/8818112/
work_keys_str_mv AT wanrusong extendedglobalx2013localrepresentationlearningforvideopersonreidentification
AT yahongwu extendedglobalx2013localrepresentationlearningforvideopersonreidentification
AT jieyingzheng extendedglobalx2013localrepresentationlearningforvideopersonreidentification
AT changhongchen extendedglobalx2013localrepresentationlearningforvideopersonreidentification
AT fengliu extendedglobalx2013localrepresentationlearningforvideopersonreidentification