SRNet: Structured Relevance Feature Learning Network From Skeleton Data for Human Action Recognition

In recent years, human action recognition based on skeleton information has recently drawn increasing attention with published large-scale skeleton datasets. The most crucial factors for this task line in two aspects: the intra-frame representation for joint co-occurrences and the inter-frame repres...

Full description

Bibliographic Details
Main Authors: Weizhi Nie, Wei Wang, Xiangdong Huang
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8832127/
Description
Summary:In recent years, human action recognition based on skeleton information has recently drawn increasing attention with published large-scale skeleton datasets. The most crucial factors for this task line in two aspects: the intra-frame representation for joint co-occurrences and the inter-frame representation for skeletons' temporal evolution. The most effective ways focus on spontaneous feature extraction by using deep learning. However, they ignore the structure information of skeleton joints and the correlation between two different skeleton joints for human action recognition. In this paper, we do not simply treat the joints position information as unordered points. Instead, we propose a novel data reorganizing strategy to represent the global and local structure information of human skeleton joints. Meanwhile, we also employ the data mirror to increase the relationship between skeleton joints. Based on this design, we proposed an end-to-end multi-dimensional CNN network (SRNet) to fully consider the spatial and temporal information to learn the feature extraction transform function. Specifically, in this CNN network, we employ different convolution kernels on different dimensions to learn skeleton representation to make the most of human structural information to generate robust features. Finally, we compare with other state-of-the-art on action recognition datasets like NTU RGB+D, PKU-MMD, SYSU, UT-Kinect, and HDM05. The experimental results also demonstrate the superiority of our method.
ISSN:2169-3536