Pose-Guided Graph Convolutional Networks for Skeleton-Based Action Recognition

Graph convolutional networks (GCN), which can model the human body skeletons as spatial and temporal graphs, have shown remarkable potential in skeleton-based action recognition. However, in the existing GCN-based methods, graph-structured representation of the human skeleton makes it difficult to b...

Full description

Bibliographic Details
Main Authors: Han Chen, Yifan Jiang, Hanseok Ko
Format: Article
Language:English
Published: IEEE 2022-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9919831/
_version_ 1811192253751754752
author Han Chen
Yifan Jiang
Hanseok Ko
author_facet Han Chen
Yifan Jiang
Hanseok Ko
author_sort Han Chen
collection DOAJ
description Graph convolutional networks (GCN), which can model the human body skeletons as spatial and temporal graphs, have shown remarkable potential in skeleton-based action recognition. However, in the existing GCN-based methods, graph-structured representation of the human skeleton makes it difficult to be fused with other modalities, especially in the early stages. This may limit their scalability and performance in action recognition tasks. In addition, the pose information, which naturally contains informative and discriminative clues for action recognition, is rarely explored together with skeleton data in existing methods. In this work, we proposed pose-guided GCN (PG-GCN), a multi-modal framework for high-performance human action recognition. In particular, a multi-stream network is constructed to simultaneously explore the robust features from both the pose and skeleton data, while a dynamic attention module is designed for early-stage feature fusion. The core idea of this module is to utilize a trainable graph to aggregate features from the skeleton stream with that of the pose stream, which leads to a network with more robust feature representation ability. Extensive experiments show that the proposed PG-GCN can achieve state-of-the-art performance on the NTU RGB+D 60 and NTU RGB+D 120 datasets.
first_indexed 2024-04-11T23:48:21Z
format Article
id doaj.art-18cfa5b04f544aebaba6f817dd964c2e
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-11T23:48:21Z
publishDate 2022-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-18cfa5b04f544aebaba6f817dd964c2e2022-12-22T03:56:33ZengIEEEIEEE Access2169-35362022-01-011011172511173110.1109/ACCESS.2022.32148129919831Pose-Guided Graph Convolutional Networks for Skeleton-Based Action RecognitionHan Chen0https://orcid.org/0000-0002-6315-802XYifan Jiang1Hanseok Ko2https://orcid.org/0000-0002-8744-4514School of Electrical Engineering, Korea University, Seoul, South KoreaSchool of Electrical Engineering, Korea University, Seoul, South KoreaSchool of Electrical Engineering, Korea University, Seoul, South KoreaGraph convolutional networks (GCN), which can model the human body skeletons as spatial and temporal graphs, have shown remarkable potential in skeleton-based action recognition. However, in the existing GCN-based methods, graph-structured representation of the human skeleton makes it difficult to be fused with other modalities, especially in the early stages. This may limit their scalability and performance in action recognition tasks. In addition, the pose information, which naturally contains informative and discriminative clues for action recognition, is rarely explored together with skeleton data in existing methods. In this work, we proposed pose-guided GCN (PG-GCN), a multi-modal framework for high-performance human action recognition. In particular, a multi-stream network is constructed to simultaneously explore the robust features from both the pose and skeleton data, while a dynamic attention module is designed for early-stage feature fusion. The core idea of this module is to utilize a trainable graph to aggregate features from the skeleton stream with that of the pose stream, which leads to a network with more robust feature representation ability. Extensive experiments show that the proposed PG-GCN can achieve state-of-the-art performance on the NTU RGB+D 60 and NTU RGB+D 120 datasets.https://ieeexplore.ieee.org/document/9919831/Action recognitionattention mechanismfeature fusiongraph convolutional networkshuman skeletonpose information
spellingShingle Han Chen
Yifan Jiang
Hanseok Ko
Pose-Guided Graph Convolutional Networks for Skeleton-Based Action Recognition
IEEE Access
Action recognition
attention mechanism
feature fusion
graph convolutional networks
human skeleton
pose information
title Pose-Guided Graph Convolutional Networks for Skeleton-Based Action Recognition
title_full Pose-Guided Graph Convolutional Networks for Skeleton-Based Action Recognition
title_fullStr Pose-Guided Graph Convolutional Networks for Skeleton-Based Action Recognition
title_full_unstemmed Pose-Guided Graph Convolutional Networks for Skeleton-Based Action Recognition
title_short Pose-Guided Graph Convolutional Networks for Skeleton-Based Action Recognition
title_sort pose guided graph convolutional networks for skeleton based action recognition
topic Action recognition
attention mechanism
feature fusion
graph convolutional networks
human skeleton
pose information
url https://ieeexplore.ieee.org/document/9919831/
work_keys_str_mv AT hanchen poseguidedgraphconvolutionalnetworksforskeletonbasedactionrecognition
AT yifanjiang poseguidedgraphconvolutionalnetworksforskeletonbasedactionrecognition
AT hanseokko poseguidedgraphconvolutionalnetworksforskeletonbasedactionrecognition