Graph convolution network based skeleton action recognition with DCT features

Human Action Recognition (HAR), which aims to decipher human movements from video, has been an important research topic in computer vision for many years, as it serves as the foundation for many innovative technologies and applications. While most recent HAR-related research focused on applying Grap...

Full description

Bibliographic Details
Main Author: Hei, Hao
Other Authors: Alex Chichung Kot
Format: Final Year Project (FYP)
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/172751
Description
Summary:Human Action Recognition (HAR), which aims to decipher human movements from video, has been an important research topic in computer vision for many years, as it serves as the foundation for many innovative technologies and applications. While most recent HAR-related research focused on applying Graph Convolutional Networks (GCNs) on skeleton modality, little attention has been paid to taking advantage of the frequency representation of skeleton data. In this project, our objective is to study the effect of utilizing skeleton features in the frequency domain to perform HAR with GCN. To achieve the target, we first conduct a thorough review of current approaches for HAR and frequency analysis. Inspired by research on attention mechanism, we proposed to combine channel attention and 2-D Discrete Cosine Transform (DCT) as a universal layer of a deep learning network to utilize the frequency information from skeleton data, which can be inserted in the current GCNs for improvements in classification accuracy. With the NTU-RGBD dataset, we conducted the experiments on three advanced GCN-based models as baseline models. Analysis of the experiment results has proven that by adding the proposed network layer, the classification accuracy of human actions of all three baseline models improved. The enhanced performance indicates the effectiveness of frequency information in the task of skeleton action recognition, as well as the potential of attention mechanism in utilizing the frequency information.