Spatial‐temporal slowfast graph convolutional network for skeleton‐based action recognition
Abstract In skeleton‐based action recognition, the graph convolutional network (GCN) has achieved great success. Modelling skeleton data in a suitable spatial‐temporal way and designing the adjacency matrix are crucial aspects for GCN‐based methods to capture joint relationships. In this study, we p...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2022-04-01
|
Series: | IET Computer Vision |
Subjects: | |
Online Access: | https://doi.org/10.1049/cvi2.12080 |
_version_ | 1828898124687474688 |
---|---|
author | Zheng Fang Xiongwei Zhang Tieyong Cao Yunfei Zheng Meng Sun |
author_facet | Zheng Fang Xiongwei Zhang Tieyong Cao Yunfei Zheng Meng Sun |
author_sort | Zheng Fang |
collection | DOAJ |
description | Abstract In skeleton‐based action recognition, the graph convolutional network (GCN) has achieved great success. Modelling skeleton data in a suitable spatial‐temporal way and designing the adjacency matrix are crucial aspects for GCN‐based methods to capture joint relationships. In this study, we propose the spatial‐temporal slowfast graph convolutional network (STSF‐GCN) and design the adjacency matrices for the skeleton data graphs in STSF‐GCN. STSF‐GCN contains two pathways: (1) the fast pathway is in a high frame rate, and joints of adjacent frames are unified to build ‘small’ spatial‐temporal graphs. A new spatial‐temporal adjacency matrix is proposed for these ‘small’ spatial‐temporal graphs. Ablation studies verify the effectiveness of the proposed adjacency matrix. (2) The slow pathway is in a low frame rate, and joints from all frames are unified to build one ‘big’ spatial‐temporal graph. The adjacency matrix for the ‘big’ spatial‐temporal graph is obtained by computing self‐attention coefficients of each joint. Finally, outputs from two pathways are fused to predict the action category. STSF‐GCN can efficiently capture both long‐range and short‐range spatial‐temporal joint relationships. On three datasets for skeleton‐based action recognition, STSF‐GCN can achieve state‐of‐the‐art performance with much less computational cost. |
first_indexed | 2024-12-13T15:09:55Z |
format | Article |
id | doaj.art-266c6d4fbd704ca2b6cc1adf8a7a7031 |
institution | Directory Open Access Journal |
issn | 1751-9632 1751-9640 |
language | English |
last_indexed | 2024-12-13T15:09:55Z |
publishDate | 2022-04-01 |
publisher | Wiley |
record_format | Article |
series | IET Computer Vision |
spelling | doaj.art-266c6d4fbd704ca2b6cc1adf8a7a70312022-12-21T23:40:54ZengWileyIET Computer Vision1751-96321751-96402022-04-0116320521710.1049/cvi2.12080Spatial‐temporal slowfast graph convolutional network for skeleton‐based action recognitionZheng Fang0Xiongwei Zhang1Tieyong Cao2Yunfei Zheng3Meng Sun4Institute of Command and Control Engineering Peoples Liberation Army Engineering University Nanjing Jiangsu ChinaInstitute of Command and Control Engineering Peoples Liberation Army Engineering University Nanjing Jiangsu ChinaInstitute of Command and Control Engineering Peoples Liberation Army Engineering University Nanjing Jiangsu ChinaInstitute of Command and Control Engineering Peoples Liberation Army Engineering University Nanjing Jiangsu ChinaInstitute of Command and Control Engineering Peoples Liberation Army Engineering University Nanjing Jiangsu ChinaAbstract In skeleton‐based action recognition, the graph convolutional network (GCN) has achieved great success. Modelling skeleton data in a suitable spatial‐temporal way and designing the adjacency matrix are crucial aspects for GCN‐based methods to capture joint relationships. In this study, we propose the spatial‐temporal slowfast graph convolutional network (STSF‐GCN) and design the adjacency matrices for the skeleton data graphs in STSF‐GCN. STSF‐GCN contains two pathways: (1) the fast pathway is in a high frame rate, and joints of adjacent frames are unified to build ‘small’ spatial‐temporal graphs. A new spatial‐temporal adjacency matrix is proposed for these ‘small’ spatial‐temporal graphs. Ablation studies verify the effectiveness of the proposed adjacency matrix. (2) The slow pathway is in a low frame rate, and joints from all frames are unified to build one ‘big’ spatial‐temporal graph. The adjacency matrix for the ‘big’ spatial‐temporal graph is obtained by computing self‐attention coefficients of each joint. Finally, outputs from two pathways are fused to predict the action category. STSF‐GCN can efficiently capture both long‐range and short‐range spatial‐temporal joint relationships. On three datasets for skeleton‐based action recognition, STSF‐GCN can achieve state‐of‐the‐art performance with much less computational cost.https://doi.org/10.1049/cvi2.12080computer visiongraph theoryvideo signal processingvideo signals |
spellingShingle | Zheng Fang Xiongwei Zhang Tieyong Cao Yunfei Zheng Meng Sun Spatial‐temporal slowfast graph convolutional network for skeleton‐based action recognition IET Computer Vision computer vision graph theory video signal processing video signals |
title | Spatial‐temporal slowfast graph convolutional network for skeleton‐based action recognition |
title_full | Spatial‐temporal slowfast graph convolutional network for skeleton‐based action recognition |
title_fullStr | Spatial‐temporal slowfast graph convolutional network for skeleton‐based action recognition |
title_full_unstemmed | Spatial‐temporal slowfast graph convolutional network for skeleton‐based action recognition |
title_short | Spatial‐temporal slowfast graph convolutional network for skeleton‐based action recognition |
title_sort | spatial temporal slowfast graph convolutional network for skeleton based action recognition |
topic | computer vision graph theory video signal processing video signals |
url | https://doi.org/10.1049/cvi2.12080 |
work_keys_str_mv | AT zhengfang spatialtemporalslowfastgraphconvolutionalnetworkforskeletonbasedactionrecognition AT xiongweizhang spatialtemporalslowfastgraphconvolutionalnetworkforskeletonbasedactionrecognition AT tieyongcao spatialtemporalslowfastgraphconvolutionalnetworkforskeletonbasedactionrecognition AT yunfeizheng spatialtemporalslowfastgraphconvolutionalnetworkforskeletonbasedactionrecognition AT mengsun spatialtemporalslowfastgraphconvolutionalnetworkforskeletonbasedactionrecognition |