Enhancing Few-Shot Action Recognition Using Skeleton Temporal Alignment and Adversarial Training

Few-shot human action recognition, a prominent area in computer vision, has garnered increasing attention and broader use in real-life scenarios. Extracting spatio-temporal skeletal information from human movement videos offers interpretable and data-efficient features. However, existing spatio-temp...

Full description

Bibliographic Details
Main Authors:	Qingyang Xu, Jianjun Yang, Hongyi Zhang, Xin Jie, Danushka Bandara
Format:	Article
Language:	English
Published:	IEEE 2024-01-01
Series:	IEEE Access
Subjects:	Action recognition few-shot learning temporal alignment adversarial training
Online Access:	https://ieeexplore.ieee.org/document/10433188/

_version_	1827325680576102400
author	Qingyang Xu Jianjun Yang Hongyi Zhang Xin Jie Danushka Bandara
author_facet	Qingyang Xu Jianjun Yang Hongyi Zhang Xin Jie Danushka Bandara
author_sort	Qingyang Xu
collection	DOAJ
description	Few-shot human action recognition, a prominent area in computer vision, has garnered increasing attention and broader use in real-life scenarios. Extracting spatio-temporal skeletal information from human movement videos offers interpretable and data-efficient features. However, existing spatio-temporal feature encoders face challenges such as handling sequence boundaries and coping with noise. In order to solve the above problems, this paper proposes a temporal complement method to optimize the Dynamic Time Warping (DTW) algorithm based on the feature representation of the human skeleton sequence. DTW helps to find optimal alignment between sequences by warping them in the time domain. This is quite useful specially in scenarios where training data is limited. However, DTW has the drawback that the optimal alignment path is highly sensitive to errors in the time series distance matrix. Therefore, we apply a Virtual Adversarial Training method to improve the anti-noise capability of the algorithm. Here, We introduce adversarial perturbations in the training phase to the time series distance matrix, thus incentivizing the model to be resilient to such noise. Our method achieves highest accuracy among protonet, DTW and DASTM methods for the 5-way-1-shot setting for the NTU-S (77.7%), and Kinetics (41.2%) datasets. For the 5-way-5-shot setting, our method achieves highest accuracy of 51.8% for Kinetics dataset when compared with the other approaches.
first_indexed	2024-03-07T14:32:47Z
format	Article
id	doaj.art-3cb059fbaf4546918ae503e7a236a259
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-03-07T14:32:47Z
publishDate	2024-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-3cb059fbaf4546918ae503e7a236a2592024-03-06T00:00:53ZengIEEEIEEE Access2169-35362024-01-0112317453175510.1109/ACCESS.2024.336544810433188Enhancing Few-Shot Action Recognition Using Skeleton Temporal Alignment and Adversarial TrainingQingyang Xu0Jianjun Yang1Hongyi Zhang2Xin Jie3Danushka Bandara4https://orcid.org/0000-0002-8885-622XCollege of Computer Science, Zhejiang University, Hangzhou, ChinaDepartment of General Practice, Shandong Provincial Third Hospital, Shandong University, Jinan, ChinaSchool of Software Technology, Zhejiang University, Hangzhou, ChinaCollege of Computer Science, Zhejiang University, Hangzhou, ChinaDepartment of Computer Science and Engineering, Fairfield University, Fairfield, CT, USAFew-shot human action recognition, a prominent area in computer vision, has garnered increasing attention and broader use in real-life scenarios. Extracting spatio-temporal skeletal information from human movement videos offers interpretable and data-efficient features. However, existing spatio-temporal feature encoders face challenges such as handling sequence boundaries and coping with noise. In order to solve the above problems, this paper proposes a temporal complement method to optimize the Dynamic Time Warping (DTW) algorithm based on the feature representation of the human skeleton sequence. DTW helps to find optimal alignment between sequences by warping them in the time domain. This is quite useful specially in scenarios where training data is limited. However, DTW has the drawback that the optimal alignment path is highly sensitive to errors in the time series distance matrix. Therefore, we apply a Virtual Adversarial Training method to improve the anti-noise capability of the algorithm. Here, We introduce adversarial perturbations in the training phase to the time series distance matrix, thus incentivizing the model to be resilient to such noise. Our method achieves highest accuracy among protonet, DTW and DASTM methods for the 5-way-1-shot setting for the NTU-S (77.7%), and Kinetics (41.2%) datasets. For the 5-way-5-shot setting, our method achieves highest accuracy of 51.8% for Kinetics dataset when compared with the other approaches.https://ieeexplore.ieee.org/document/10433188/Action recognitionfew-shot learningtemporal alignmentadversarial training
spellingShingle	Qingyang Xu Jianjun Yang Hongyi Zhang Xin Jie Danushka Bandara Enhancing Few-Shot Action Recognition Using Skeleton Temporal Alignment and Adversarial Training IEEE Access Action recognition few-shot learning temporal alignment adversarial training
title	Enhancing Few-Shot Action Recognition Using Skeleton Temporal Alignment and Adversarial Training
title_full	Enhancing Few-Shot Action Recognition Using Skeleton Temporal Alignment and Adversarial Training
title_fullStr	Enhancing Few-Shot Action Recognition Using Skeleton Temporal Alignment and Adversarial Training
title_full_unstemmed	Enhancing Few-Shot Action Recognition Using Skeleton Temporal Alignment and Adversarial Training
title_short	Enhancing Few-Shot Action Recognition Using Skeleton Temporal Alignment and Adversarial Training
title_sort	enhancing few shot action recognition using skeleton temporal alignment and adversarial training
topic	Action recognition few-shot learning temporal alignment adversarial training
url	https://ieeexplore.ieee.org/document/10433188/
work_keys_str_mv	AT qingyangxu enhancingfewshotactionrecognitionusingskeletontemporalalignmentandadversarialtraining AT jianjunyang enhancingfewshotactionrecognitionusingskeletontemporalalignmentandadversarialtraining AT hongyizhang enhancingfewshotactionrecognitionusingskeletontemporalalignmentandadversarialtraining AT xinjie enhancingfewshotactionrecognitionusingskeletontemporalalignmentandadversarialtraining AT danushkabandara enhancingfewshotactionrecognitionusingskeletontemporalalignmentandadversarialtraining

Enhancing Few-Shot Action Recognition Using Skeleton Temporal Alignment and Adversarial Training

Similar Items