Summary: | This paper proposes a dual-stream 3D space-time convolutional neural network action recognition framework. The original depth map sequence data is set as the input in order to study the global space-time characteristics of each action category. The high correlation within the human action itself is considered in the time domain, and then the deep motion map sequence is introduced as the input to another stream of the 3D space-time convolutional network. Furthermore, the corresponding 3D skeleton sequence data is set as the third input of the whole recognition framework. Although the skeleton sequence data has the advantage of including 3D information, it is also confronted with the problems of the existence of rate change, temporal mismatch and noise. Thus, specially designed space-time features are applied to cope with these problems. The proposed methods allow the whole recognition system to fully exploit and utilize the discriminatory space-time features from different perspectives, and ultimately improve the classification accuracy of the system. Experimental results on different public 3D data sets illustrate the effectiveness of the proposed method.
|