TFC-GCN: Lightweight Temporal Feature Cross-Extraction Graph Convolutional Network for Skeleton-Based Action Recognition

For skeleton-based action recognition, graph convolutional networks (GCN) have absolute advantages. Existing state-of-the-art (SOTA) methods tended to focus on extracting and identifying features from all bones and joints. However, they ignored many new input features which could be discovered. More...

Full description

Bibliographic Details
Main Authors: Kaixuan Wang, Hongmin Deng
Format: Article
Language:English
Published: MDPI AG 2023-06-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/23/12/5593
_version_ 1797592672825769984
author Kaixuan Wang
Hongmin Deng
author_facet Kaixuan Wang
Hongmin Deng
author_sort Kaixuan Wang
collection DOAJ
description For skeleton-based action recognition, graph convolutional networks (GCN) have absolute advantages. Existing state-of-the-art (SOTA) methods tended to focus on extracting and identifying features from all bones and joints. However, they ignored many new input features which could be discovered. Moreover, many GCN-based action recognition models did not pay sufficient attention to the extraction of temporal features. In addition, most models had swollen structures due to too many parameters. In order to solve the problems mentioned above, a temporal feature cross-extraction graph convolutional network (TFC-GCN) is proposed, which has a small number of parameters. Firstly, we propose the feature extraction strategy of the relative displacements of joints, which is fitted for the relative displacement between its previous and subsequent frames. Then, TFC-GCN uses a temporal feature cross-extraction block with gated information filtering to excavate high-level representations for human actions. Finally, we propose a stitching spatial–temporal attention (SST-Att) block for different joints to be given different weights so as to obtain favorable results for classification. FLOPs and the number of parameters of TFC-GCN reach 1.90 G and 0.18 M, respectively. The superiority has been verified on three large-scale public datasets, namely NTU RGB + D60, NTU RGB + D120 and UAV-Human.
first_indexed 2024-03-11T01:57:25Z
format Article
id doaj.art-d1918c68e0b24c5ba75d4d502ceac249
institution Directory Open Access Journal
issn 1424-8220
language English
last_indexed 2024-03-11T01:57:25Z
publishDate 2023-06-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj.art-d1918c68e0b24c5ba75d4d502ceac2492023-11-18T12:33:18ZengMDPI AGSensors1424-82202023-06-012312559310.3390/s23125593TFC-GCN: Lightweight Temporal Feature Cross-Extraction Graph Convolutional Network for Skeleton-Based Action RecognitionKaixuan Wang0Hongmin Deng1College of Electronics and Information Engineering, Sichuan University, No. 24, Section 1, First Ring Road, Wuhou District, Chengdu 610041, ChinaCollege of Electronics and Information Engineering, Sichuan University, No. 24, Section 1, First Ring Road, Wuhou District, Chengdu 610041, ChinaFor skeleton-based action recognition, graph convolutional networks (GCN) have absolute advantages. Existing state-of-the-art (SOTA) methods tended to focus on extracting and identifying features from all bones and joints. However, they ignored many new input features which could be discovered. Moreover, many GCN-based action recognition models did not pay sufficient attention to the extraction of temporal features. In addition, most models had swollen structures due to too many parameters. In order to solve the problems mentioned above, a temporal feature cross-extraction graph convolutional network (TFC-GCN) is proposed, which has a small number of parameters. Firstly, we propose the feature extraction strategy of the relative displacements of joints, which is fitted for the relative displacement between its previous and subsequent frames. Then, TFC-GCN uses a temporal feature cross-extraction block with gated information filtering to excavate high-level representations for human actions. Finally, we propose a stitching spatial–temporal attention (SST-Att) block for different joints to be given different weights so as to obtain favorable results for classification. FLOPs and the number of parameters of TFC-GCN reach 1.90 G and 0.18 M, respectively. The superiority has been verified on three large-scale public datasets, namely NTU RGB + D60, NTU RGB + D120 and UAV-Human.https://www.mdpi.com/1424-8220/23/12/5593deep learningaction recognitiongraph convolutional networkslightweight
spellingShingle Kaixuan Wang
Hongmin Deng
TFC-GCN: Lightweight Temporal Feature Cross-Extraction Graph Convolutional Network for Skeleton-Based Action Recognition
Sensors
deep learning
action recognition
graph convolutional networks
lightweight
title TFC-GCN: Lightweight Temporal Feature Cross-Extraction Graph Convolutional Network for Skeleton-Based Action Recognition
title_full TFC-GCN: Lightweight Temporal Feature Cross-Extraction Graph Convolutional Network for Skeleton-Based Action Recognition
title_fullStr TFC-GCN: Lightweight Temporal Feature Cross-Extraction Graph Convolutional Network for Skeleton-Based Action Recognition
title_full_unstemmed TFC-GCN: Lightweight Temporal Feature Cross-Extraction Graph Convolutional Network for Skeleton-Based Action Recognition
title_short TFC-GCN: Lightweight Temporal Feature Cross-Extraction Graph Convolutional Network for Skeleton-Based Action Recognition
title_sort tfc gcn lightweight temporal feature cross extraction graph convolutional network for skeleton based action recognition
topic deep learning
action recognition
graph convolutional networks
lightweight
url https://www.mdpi.com/1424-8220/23/12/5593
work_keys_str_mv AT kaixuanwang tfcgcnlightweighttemporalfeaturecrossextractiongraphconvolutionalnetworkforskeletonbasedactionrecognition
AT hongmindeng tfcgcnlightweighttemporalfeaturecrossextractiongraphconvolutionalnetworkforskeletonbasedactionrecognition