Summary: | In skeleton-based action recognition, the approaches based on graph convolutional networks(GCN) have achieved remarkable performance by modeling spatial-temporal graphs to explore the physical dependencies between body joints. However, these methods mostly apply hierarchical GCNs to aggregate wider-range neighborhood information, which makes joint features be weakened during long diffusion. In this paper, we design a multi-scale mixed dense graph convolutional network (MMDGCN) to overcome both shortcomings. We propose a dense graph convolution operation to enhance the local context information of joints, and then the spatial and temporal attention modules with a larger receptive field are introduced to help the model strengthen the discriminative features to adaptively refine the intermediate feature maps. We also design a multi-scale mixed temporal convolution module, which provides a flexible temporal graph through the combination of different scale convolution kernels. Extensive experiments on the three real-world datasets (NTU-RGB+D, NTU-RGB+D120 and Kinetics) demonstrate that the performance of the proposed MMDGCN in skeleton-based action recognition.
|