Summary: | In recent years, significant progress has been made in modeling temporal sequences and spatial structures in skeleton-based human action recognition. However, existing methods rely on explicit modeling of the inherent structure of the human body, which may result in reduced joint saliency and poor interpretability due to the sparsity of skeleton data and the relative smoothness of convolutions. This paper proposes a feature matching method based on the progressive decoding strategy. As human movement is a chain process, the strategy progressively decodes human pose features from the center to the periphery, using multi-level graph filters to obtain multi-frequency hierarchical graph features. Then the adaptive convolution kernels are constructed to match the local similarities between graph features. Self-similarity of the query set and the mutual similarity between the query set and the support set samples are calculated to analyze the entire posture of the human body, and similar skeletal features are distinguished according to the similarity of the node spectrum to differentiate between different action categories. Through experimental verification of two public data sets, the proposed method has better recognition accuracy and generalization of small sample behavior. The experiments show that the proposed method outperforms the existing methods on NTU RGB + D and Human36M Dataset.
|