Spatial‐temporal slowfast graph convolutional network for skeleton‐based action recognition

Abstract In skeleton‐based action recognition, the graph convolutional network (GCN) has achieved great success. Modelling skeleton data in a suitable spatial‐temporal way and designing the adjacency matrix are crucial aspects for GCN‐based methods to capture joint relationships. In this study, we p...

Full description

Bibliographic Details
Main Authors: Zheng Fang, Xiongwei Zhang, Tieyong Cao, Yunfei Zheng, Meng Sun
Format: Article
Language:English
Published: Wiley 2022-04-01
Series:IET Computer Vision
Subjects:
Online Access:https://doi.org/10.1049/cvi2.12080
_version_ 1828898124687474688
author Zheng Fang
Xiongwei Zhang
Tieyong Cao
Yunfei Zheng
Meng Sun
author_facet Zheng Fang
Xiongwei Zhang
Tieyong Cao
Yunfei Zheng
Meng Sun
author_sort Zheng Fang
collection DOAJ
description Abstract In skeleton‐based action recognition, the graph convolutional network (GCN) has achieved great success. Modelling skeleton data in a suitable spatial‐temporal way and designing the adjacency matrix are crucial aspects for GCN‐based methods to capture joint relationships. In this study, we propose the spatial‐temporal slowfast graph convolutional network (STSF‐GCN) and design the adjacency matrices for the skeleton data graphs in STSF‐GCN. STSF‐GCN contains two pathways: (1) the fast pathway is in a high frame rate, and joints of adjacent frames are unified to build ‘small’ spatial‐temporal graphs. A new spatial‐temporal adjacency matrix is proposed for these ‘small’ spatial‐temporal graphs. Ablation studies verify the effectiveness of the proposed adjacency matrix. (2) The slow pathway is in a low frame rate, and joints from all frames are unified to build one ‘big’ spatial‐temporal graph. The adjacency matrix for the ‘big’ spatial‐temporal graph is obtained by computing self‐attention coefficients of each joint. Finally, outputs from two pathways are fused to predict the action category. STSF‐GCN can efficiently capture both long‐range and short‐range spatial‐temporal joint relationships. On three datasets for skeleton‐based action recognition, STSF‐GCN can achieve state‐of‐the‐art performance with much less computational cost.
first_indexed 2024-12-13T15:09:55Z
format Article
id doaj.art-266c6d4fbd704ca2b6cc1adf8a7a7031
institution Directory Open Access Journal
issn 1751-9632
1751-9640
language English
last_indexed 2024-12-13T15:09:55Z
publishDate 2022-04-01
publisher Wiley
record_format Article
series IET Computer Vision
spelling doaj.art-266c6d4fbd704ca2b6cc1adf8a7a70312022-12-21T23:40:54ZengWileyIET Computer Vision1751-96321751-96402022-04-0116320521710.1049/cvi2.12080Spatial‐temporal slowfast graph convolutional network for skeleton‐based action recognitionZheng Fang0Xiongwei Zhang1Tieyong Cao2Yunfei Zheng3Meng Sun4Institute of Command and Control Engineering Peoples Liberation Army Engineering University Nanjing Jiangsu ChinaInstitute of Command and Control Engineering Peoples Liberation Army Engineering University Nanjing Jiangsu ChinaInstitute of Command and Control Engineering Peoples Liberation Army Engineering University Nanjing Jiangsu ChinaInstitute of Command and Control Engineering Peoples Liberation Army Engineering University Nanjing Jiangsu ChinaInstitute of Command and Control Engineering Peoples Liberation Army Engineering University Nanjing Jiangsu ChinaAbstract In skeleton‐based action recognition, the graph convolutional network (GCN) has achieved great success. Modelling skeleton data in a suitable spatial‐temporal way and designing the adjacency matrix are crucial aspects for GCN‐based methods to capture joint relationships. In this study, we propose the spatial‐temporal slowfast graph convolutional network (STSF‐GCN) and design the adjacency matrices for the skeleton data graphs in STSF‐GCN. STSF‐GCN contains two pathways: (1) the fast pathway is in a high frame rate, and joints of adjacent frames are unified to build ‘small’ spatial‐temporal graphs. A new spatial‐temporal adjacency matrix is proposed for these ‘small’ spatial‐temporal graphs. Ablation studies verify the effectiveness of the proposed adjacency matrix. (2) The slow pathway is in a low frame rate, and joints from all frames are unified to build one ‘big’ spatial‐temporal graph. The adjacency matrix for the ‘big’ spatial‐temporal graph is obtained by computing self‐attention coefficients of each joint. Finally, outputs from two pathways are fused to predict the action category. STSF‐GCN can efficiently capture both long‐range and short‐range spatial‐temporal joint relationships. On three datasets for skeleton‐based action recognition, STSF‐GCN can achieve state‐of‐the‐art performance with much less computational cost.https://doi.org/10.1049/cvi2.12080computer visiongraph theoryvideo signal processingvideo signals
spellingShingle Zheng Fang
Xiongwei Zhang
Tieyong Cao
Yunfei Zheng
Meng Sun
Spatial‐temporal slowfast graph convolutional network for skeleton‐based action recognition
IET Computer Vision
computer vision
graph theory
video signal processing
video signals
title Spatial‐temporal slowfast graph convolutional network for skeleton‐based action recognition
title_full Spatial‐temporal slowfast graph convolutional network for skeleton‐based action recognition
title_fullStr Spatial‐temporal slowfast graph convolutional network for skeleton‐based action recognition
title_full_unstemmed Spatial‐temporal slowfast graph convolutional network for skeleton‐based action recognition
title_short Spatial‐temporal slowfast graph convolutional network for skeleton‐based action recognition
title_sort spatial temporal slowfast graph convolutional network for skeleton based action recognition
topic computer vision
graph theory
video signal processing
video signals
url https://doi.org/10.1049/cvi2.12080
work_keys_str_mv AT zhengfang spatialtemporalslowfastgraphconvolutionalnetworkforskeletonbasedactionrecognition
AT xiongweizhang spatialtemporalslowfastgraphconvolutionalnetworkforskeletonbasedactionrecognition
AT tieyongcao spatialtemporalslowfastgraphconvolutionalnetworkforskeletonbasedactionrecognition
AT yunfeizheng spatialtemporalslowfastgraphconvolutionalnetworkforskeletonbasedactionrecognition
AT mengsun spatialtemporalslowfastgraphconvolutionalnetworkforskeletonbasedactionrecognition