Human action recognition using artificial intelligence

The field of computer vision has long been focused on Human Action Recognition (HAR), spanning various domains. Over the years, numerous outstand ing models have been developed for human action recognition, evolving from early CNN architectures to Two-Stream Networks, and more recently, emerging tec...

Full description

Bibliographic Details
Main Author: Wang, Ruixian
Other Authors: Yap Kim Hui
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/178232
Description
Summary:The field of computer vision has long been focused on Human Action Recognition (HAR), spanning various domains. Over the years, numerous outstand ing models have been developed for human action recognition, evolving from early CNN architectures to Two-Stream Networks, and more recently, emerging techniques like 3D CNN, Transformer-based, and efficient modeling. However, these models often exhibit poor performance on videos with longer temporal sequences, known as long actions, compared to shorter sequences. Improving the ability of model to extract features from long temporal sequences remains a challenging task. This dissertation adopts ResNet50 as the primary framework and builds upon the Temporal Shift Module (TSM) [1] to introduce the Multiple Temporal Shift Module (MTSM). MTSM increases the number of temporally shifted video frames within each channel, aiming to enhance the feature extraction capability of model in the temporal dimension. We demonstrate that MTSM achieves promising results, with a top-1 accuracy of 96.4% on the UCF101 dataset, marking a 3.2% improvement over TSM, and a top-1 accuracy of 74.9% on the HMDB51 dataset, representing a 4.0% increase. Additionally, we crawled human action videos from the internet and constructed a new video dataset which contains 2837 videos and is divided into 8 categories. On the constructed HAR dataset, MTSM achieved a top-1 accuracy of 71.9%. Our proposed MTSM exhibits improved performance compared to recent models in the field.