Spatial‐temporal slowfast graph convolutional network for skeleton‐based action recognition

Abstract In skeleton‐based action recognition, the graph convolutional network (GCN) has achieved great success. Modelling skeleton data in a suitable spatial‐temporal way and designing the adjacency matrix are crucial aspects for GCN‐based methods to capture joint relationships. In this study, we p...

Full description

Bibliographic Details
Main Authors:	Zheng Fang, Xiongwei Zhang, Tieyong Cao, Yunfei Zheng, Meng Sun
Format:	Article
Language:	English
Published:	Wiley 2022-04-01
Series:	IET Computer Vision
Subjects:	computer vision graph theory video signal processing video signals
Online Access:	https://doi.org/10.1049/cvi2.12080

_version_	1828898124687474688
author	Zheng Fang Xiongwei Zhang Tieyong Cao Yunfei Zheng Meng Sun
author_facet	Zheng Fang Xiongwei Zhang Tieyong Cao Yunfei Zheng Meng Sun
author_sort	Zheng Fang
collection	DOAJ
description	Abstract In skeleton‐based action recognition, the graph convolutional network (GCN) has achieved great success. Modelling skeleton data in a suitable spatial‐temporal way and designing the adjacency matrix are crucial aspects for GCN‐based methods to capture joint relationships. In this study, we propose the spatial‐temporal slowfast graph convolutional network (STSF‐GCN) and design the adjacency matrices for the skeleton data graphs in STSF‐GCN. STSF‐GCN contains two pathways: (1) the fast pathway is in a high frame rate, and joints of adjacent frames are unified to build ‘small’ spatial‐temporal graphs. A new spatial‐temporal adjacency matrix is proposed for these ‘small’ spatial‐temporal graphs. Ablation studies verify the effectiveness of the proposed adjacency matrix. (2) The slow pathway is in a low frame rate, and joints from all frames are unified to build one ‘big’ spatial‐temporal graph. The adjacency matrix for the ‘big’ spatial‐temporal graph is obtained by computing self‐attention coefficients of each joint. Finally, outputs from two pathways are fused to predict the action category. STSF‐GCN can efficiently capture both long‐range and short‐range spatial‐temporal joint relationships. On three datasets for skeleton‐based action recognition, STSF‐GCN can achieve state‐of‐the‐art performance with much less computational cost.
first_indexed	2024-12-13T15:09:55Z
format	Article
id	doaj.art-266c6d4fbd704ca2b6cc1adf8a7a7031
institution	Directory Open Access Journal
issn	1751-9632 1751-9640
language	English
last_indexed	2024-12-13T15:09:55Z
publishDate	2022-04-01
publisher	Wiley
record_format	Article
series	IET Computer Vision
spelling	doaj.art-266c6d4fbd704ca2b6cc1adf8a7a70312022-12-21T23:40:54ZengWileyIET Computer Vision1751-96321751-96402022-04-0116320521710.1049/cvi2.12080Spatial‐temporal slowfast graph convolutional network for skeleton‐based action recognitionZheng Fang0Xiongwei Zhang1Tieyong Cao2Yunfei Zheng3Meng Sun4Institute of Command and Control Engineering Peoples Liberation Army Engineering University Nanjing Jiangsu ChinaInstitute of Command and Control Engineering Peoples Liberation Army Engineering University Nanjing Jiangsu ChinaInstitute of Command and Control Engineering Peoples Liberation Army Engineering University Nanjing Jiangsu ChinaInstitute of Command and Control Engineering Peoples Liberation Army Engineering University Nanjing Jiangsu ChinaInstitute of Command and Control Engineering Peoples Liberation Army Engineering University Nanjing Jiangsu ChinaAbstract In skeleton‐based action recognition, the graph convolutional network (GCN) has achieved great success. Modelling skeleton data in a suitable spatial‐temporal way and designing the adjacency matrix are crucial aspects for GCN‐based methods to capture joint relationships. In this study, we propose the spatial‐temporal slowfast graph convolutional network (STSF‐GCN) and design the adjacency matrices for the skeleton data graphs in STSF‐GCN. STSF‐GCN contains two pathways: (1) the fast pathway is in a high frame rate, and joints of adjacent frames are unified to build ‘small’ spatial‐temporal graphs. A new spatial‐temporal adjacency matrix is proposed for these ‘small’ spatial‐temporal graphs. Ablation studies verify the effectiveness of the proposed adjacency matrix. (2) The slow pathway is in a low frame rate, and joints from all frames are unified to build one ‘big’ spatial‐temporal graph. The adjacency matrix for the ‘big’ spatial‐temporal graph is obtained by computing self‐attention coefficients of each joint. Finally, outputs from two pathways are fused to predict the action category. STSF‐GCN can efficiently capture both long‐range and short‐range spatial‐temporal joint relationships. On three datasets for skeleton‐based action recognition, STSF‐GCN can achieve state‐of‐the‐art performance with much less computational cost.https://doi.org/10.1049/cvi2.12080computer visiongraph theoryvideo signal processingvideo signals
spellingShingle	Zheng Fang Xiongwei Zhang Tieyong Cao Yunfei Zheng Meng Sun Spatial‐temporal slowfast graph convolutional network for skeleton‐based action recognition IET Computer Vision computer vision graph theory video signal processing video signals
title	Spatial‐temporal slowfast graph convolutional network for skeleton‐based action recognition
title_full	Spatial‐temporal slowfast graph convolutional network for skeleton‐based action recognition
title_fullStr	Spatial‐temporal slowfast graph convolutional network for skeleton‐based action recognition
title_full_unstemmed	Spatial‐temporal slowfast graph convolutional network for skeleton‐based action recognition
title_short	Spatial‐temporal slowfast graph convolutional network for skeleton‐based action recognition
title_sort	spatial temporal slowfast graph convolutional network for skeleton based action recognition
topic	computer vision graph theory video signal processing video signals
url	https://doi.org/10.1049/cvi2.12080
work_keys_str_mv	AT zhengfang spatialtemporalslowfastgraphconvolutionalnetworkforskeletonbasedactionrecognition AT xiongweizhang spatialtemporalslowfastgraphconvolutionalnetworkforskeletonbasedactionrecognition AT tieyongcao spatialtemporalslowfastgraphconvolutionalnetworkforskeletonbasedactionrecognition AT yunfeizheng spatialtemporalslowfastgraphconvolutionalnetworkforskeletonbasedactionrecognition AT mengsun spatialtemporalslowfastgraphconvolutionalnetworkforskeletonbasedactionrecognition

Spatial‐temporal slowfast graph convolutional network for skeleton‐based action recognition

Similar Items