Enhancing Few-Shot Action Recognition Using Skeleton Temporal Alignment and Adversarial Training

Few-shot human action recognition, a prominent area in computer vision, has garnered increasing attention and broader use in real-life scenarios. Extracting spatio-temporal skeletal information from human movement videos offers interpretable and data-efficient features. However, existing spatio-temp...

Full description

Bibliographic Details
Main Authors: Qingyang Xu, Jianjun Yang, Hongyi Zhang, Xin Jie, Danushka Bandara
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10433188/
_version_ 1797272677625364480
author Qingyang Xu
Jianjun Yang
Hongyi Zhang
Xin Jie
Danushka Bandara
author_facet Qingyang Xu
Jianjun Yang
Hongyi Zhang
Xin Jie
Danushka Bandara
author_sort Qingyang Xu
collection DOAJ
description Few-shot human action recognition, a prominent area in computer vision, has garnered increasing attention and broader use in real-life scenarios. Extracting spatio-temporal skeletal information from human movement videos offers interpretable and data-efficient features. However, existing spatio-temporal feature encoders face challenges such as handling sequence boundaries and coping with noise. In order to solve the above problems, this paper proposes a temporal complement method to optimize the Dynamic Time Warping (DTW) algorithm based on the feature representation of the human skeleton sequence. DTW helps to find optimal alignment between sequences by warping them in the time domain. This is quite useful specially in scenarios where training data is limited. However, DTW has the drawback that the optimal alignment path is highly sensitive to errors in the time series distance matrix. Therefore, we apply a Virtual Adversarial Training method to improve the anti-noise capability of the algorithm. Here, We introduce adversarial perturbations in the training phase to the time series distance matrix, thus incentivizing the model to be resilient to such noise. Our method achieves highest accuracy among protonet, DTW and DASTM methods for the 5-way-1-shot setting for the NTU-S (77.7%), and Kinetics (41.2%) datasets. For the 5-way-5-shot setting, our method achieves highest accuracy of 51.8% for Kinetics dataset when compared with the other approaches.
first_indexed 2024-03-07T14:32:47Z
format Article
id doaj.art-3cb059fbaf4546918ae503e7a236a259
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-03-07T14:32:47Z
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-3cb059fbaf4546918ae503e7a236a2592024-03-06T00:00:53ZengIEEEIEEE Access2169-35362024-01-0112317453175510.1109/ACCESS.2024.336544810433188Enhancing Few-Shot Action Recognition Using Skeleton Temporal Alignment and Adversarial TrainingQingyang Xu0Jianjun Yang1Hongyi Zhang2Xin Jie3Danushka Bandara4https://orcid.org/0000-0002-8885-622XCollege of Computer Science, Zhejiang University, Hangzhou, ChinaDepartment of General Practice, Shandong Provincial Third Hospital, Shandong University, Jinan, ChinaSchool of Software Technology, Zhejiang University, Hangzhou, ChinaCollege of Computer Science, Zhejiang University, Hangzhou, ChinaDepartment of Computer Science and Engineering, Fairfield University, Fairfield, CT, USAFew-shot human action recognition, a prominent area in computer vision, has garnered increasing attention and broader use in real-life scenarios. Extracting spatio-temporal skeletal information from human movement videos offers interpretable and data-efficient features. However, existing spatio-temporal feature encoders face challenges such as handling sequence boundaries and coping with noise. In order to solve the above problems, this paper proposes a temporal complement method to optimize the Dynamic Time Warping (DTW) algorithm based on the feature representation of the human skeleton sequence. DTW helps to find optimal alignment between sequences by warping them in the time domain. This is quite useful specially in scenarios where training data is limited. However, DTW has the drawback that the optimal alignment path is highly sensitive to errors in the time series distance matrix. Therefore, we apply a Virtual Adversarial Training method to improve the anti-noise capability of the algorithm. Here, We introduce adversarial perturbations in the training phase to the time series distance matrix, thus incentivizing the model to be resilient to such noise. Our method achieves highest accuracy among protonet, DTW and DASTM methods for the 5-way-1-shot setting for the NTU-S (77.7%), and Kinetics (41.2%) datasets. For the 5-way-5-shot setting, our method achieves highest accuracy of 51.8% for Kinetics dataset when compared with the other approaches.https://ieeexplore.ieee.org/document/10433188/Action recognitionfew-shot learningtemporal alignmentadversarial training
spellingShingle Qingyang Xu
Jianjun Yang
Hongyi Zhang
Xin Jie
Danushka Bandara
Enhancing Few-Shot Action Recognition Using Skeleton Temporal Alignment and Adversarial Training
IEEE Access
Action recognition
few-shot learning
temporal alignment
adversarial training
title Enhancing Few-Shot Action Recognition Using Skeleton Temporal Alignment and Adversarial Training
title_full Enhancing Few-Shot Action Recognition Using Skeleton Temporal Alignment and Adversarial Training
title_fullStr Enhancing Few-Shot Action Recognition Using Skeleton Temporal Alignment and Adversarial Training
title_full_unstemmed Enhancing Few-Shot Action Recognition Using Skeleton Temporal Alignment and Adversarial Training
title_short Enhancing Few-Shot Action Recognition Using Skeleton Temporal Alignment and Adversarial Training
title_sort enhancing few shot action recognition using skeleton temporal alignment and adversarial training
topic Action recognition
few-shot learning
temporal alignment
adversarial training
url https://ieeexplore.ieee.org/document/10433188/
work_keys_str_mv AT qingyangxu enhancingfewshotactionrecognitionusingskeletontemporalalignmentandadversarialtraining
AT jianjunyang enhancingfewshotactionrecognitionusingskeletontemporalalignmentandadversarialtraining
AT hongyizhang enhancingfewshotactionrecognitionusingskeletontemporalalignmentandadversarialtraining
AT xinjie enhancingfewshotactionrecognitionusingskeletontemporalalignmentandadversarialtraining
AT danushkabandara enhancingfewshotactionrecognitionusingskeletontemporalalignmentandadversarialtraining