Few-shot action recognition with permutation-invariant attention

Many few-shot learning models focus on recognising images. In contrast, we tackle a challenging task of few-shot action recognition from videos. We build on a C3D encoder for spatio-temporal video blocks to capture short-range action patterns. Such encoded blocks are aggregated by permutation-invari...

Full description

Bibliographic Details
Main Authors:	Zhang, H, Zhang, L, Qi, X, Li, H, Torr, PHS, Koniusz, P
Format:	Conference item
Language:	English
Published:	Springer 2020

_version_	1797053795188867072
author	Zhang, H Zhang, L Qi, X Li, H Torr, PHS Koniusz, P
author_facet	Zhang, H Zhang, L Qi, X Li, H Torr, PHS Koniusz, P
author_sort	Zhang, H
collection	OXFORD
description	Many few-shot learning models focus on recognising images. In contrast, we tackle a challenging task of few-shot action recognition from videos. We build on a C3D encoder for spatio-temporal video blocks to capture short-range action patterns. Such encoded blocks are aggregated by permutation-invariant pooling to make our approach robust to varying action lengths and long-range temporal dependencies whose patterns are unlikely to repeat even in clips of the same class. Subsequently, the pooled representations are combined into simple relation descriptors which encode so-called query and support clips. Finally, relation descriptors are fed to the comparator with the goal of similarity learning between query and support clips. Importantly, to re-weight block contributions during pooling, we exploit spatial and temporal attention modules and self-supervision. In naturalistic clips (of the same class) there exists a temporal distribution shift–the locations of discriminative temporal action hotspots vary. Thus, we permute blocks of a clip and align the resulting attention regions with similarly permuted attention regions of non-permuted clip to train the attention mechanism invariant to block (and thus long-term hotspot) permutations. Our method outperforms the state of the art on the HMDB51, UCF101, miniMIT datasets.
first_indexed	2024-03-06T18:48:34Z
format	Conference item
id	oxford-uuid:0f61cb08-de97-42f7-aff6-3632500644a6
institution	University of Oxford
language	English
last_indexed	2024-03-06T18:48:34Z
publishDate	2020
publisher	Springer
record_format	dspace
spelling	oxford-uuid:0f61cb08-de97-42f7-aff6-3632500644a62022-03-26T09:50:58ZFew-shot action recognition with permutation-invariant attentionConference itemhttp://purl.org/coar/resource_type/c_5794uuid:0f61cb08-de97-42f7-aff6-3632500644a6EnglishSymplectic ElementsSpringer2020Zhang, HZhang, LQi, XLi, HTorr, PHSKoniusz, PMany few-shot learning models focus on recognising images. In contrast, we tackle a challenging task of few-shot action recognition from videos. We build on a C3D encoder for spatio-temporal video blocks to capture short-range action patterns. Such encoded blocks are aggregated by permutation-invariant pooling to make our approach robust to varying action lengths and long-range temporal dependencies whose patterns are unlikely to repeat even in clips of the same class. Subsequently, the pooled representations are combined into simple relation descriptors which encode so-called query and support clips. Finally, relation descriptors are fed to the comparator with the goal of similarity learning between query and support clips. Importantly, to re-weight block contributions during pooling, we exploit spatial and temporal attention modules and self-supervision. In naturalistic clips (of the same class) there exists a temporal distribution shift–the locations of discriminative temporal action hotspots vary. Thus, we permute blocks of a clip and align the resulting attention regions with similarly permuted attention regions of non-permuted clip to train the attention mechanism invariant to block (and thus long-term hotspot) permutations. Our method outperforms the state of the art on the HMDB51, UCF101, miniMIT datasets.
spellingShingle	Zhang, H Zhang, L Qi, X Li, H Torr, PHS Koniusz, P Few-shot action recognition with permutation-invariant attention
title	Few-shot action recognition with permutation-invariant attention
title_full	Few-shot action recognition with permutation-invariant attention
title_fullStr	Few-shot action recognition with permutation-invariant attention
title_full_unstemmed	Few-shot action recognition with permutation-invariant attention
title_short	Few-shot action recognition with permutation-invariant attention
title_sort	few shot action recognition with permutation invariant attention
work_keys_str_mv	AT zhangh fewshotactionrecognitionwithpermutationinvariantattention AT zhangl fewshotactionrecognitionwithpermutationinvariantattention AT qix fewshotactionrecognitionwithpermutationinvariantattention AT lih fewshotactionrecognitionwithpermutationinvariantattention AT torrphs fewshotactionrecognitionwithpermutationinvariantattention AT koniuszp fewshotactionrecognitionwithpermutationinvariantattention

Few-shot action recognition with permutation-invariant attention

Similar Items