Summary: | Human activity recognition (HAR) has gained popularity in the field of computer vision such as video surveillance, security, and virtual reality. However, traditional methods are limited in terms of computations and holistic learning of human skeletal sequences. In this article, a new time‐series skeleton joint data imaging method is infused into an improved convolutional neural network to handle these problems. First, the raw time‐series data of 33 body nodes are transformed to red–green–blue images by encoding the 3D positional information to one pixel. Second, the LeNet‐5 network is enhanced by expanding the receptive field, introducing coordinate attention and the smooth maximum unit to improve smoothness and feature extraction. Third, the ability of coded images to express human activities is studied in various environments. It is shown in the experimental results that the method achieves an impressive accuracy of 98.02% in recognizing 25 daily human activities, such as running, writing, and walking. In addition, it is shown that the number of floating point operations, parameters, and inference time of the method are 0.08%, 0.47%, and 3.05%, respectively, of the average values for six other networks (including AlexNet, GoogLeNet, and MobileNet). The proposed method is thus a novel, lightweight, and high‐precision solution for HAR.
|