Skeleton‐based attention‐aware spatial–temporal model for action detection and recognition

Action detection and recognition are popular subjects of research in the field of computer vision. The task of action detection can be regarded as the sum of action location and recognition. Action features described by using information concerning the human skeleton have the advantages of robustnes...

Full description

Bibliographic Details
Main Authors: Ran Cui, Aichun Zhu, Jingran Wu, Gang Hua
Format: Article
Language:English
Published: Wiley 2020-08-01
Series:IET Computer Vision
Subjects:
Online Access:https://doi.org/10.1049/iet-cvi.2019.0751
_version_ 1797684721025548288
author Ran Cui
Aichun Zhu
Jingran Wu
Gang Hua
author_facet Ran Cui
Aichun Zhu
Jingran Wu
Gang Hua
author_sort Ran Cui
collection DOAJ
description Action detection and recognition are popular subjects of research in the field of computer vision. The task of action detection can be regarded as the sum of action location and recognition. Action features described by using information concerning the human skeleton have the advantages of robustness against external factors and requiring a small amount of calculation. This study proposes a skeleton‐based action analysis model based on a recurrent neural network framework. The model learns action features by modelling static and dynamic features of skeleton joints and the importance of different video frames by introducing an attention module. For action location, conditional random field loss function is introduced to establish the context dependency of output labels. In the aspect of action recognition, the hierarchical training mechanism with triple loss models action features at coarse‐grained and fine‐grained levels. The authors’ proposed method delivers state‐of‐the‐art results on action location and recognition tasks.
first_indexed 2024-03-12T00:33:50Z
format Article
id doaj.art-11664c68d22c4587af430aa4e1c37d14
institution Directory Open Access Journal
issn 1751-9632
1751-9640
language English
last_indexed 2024-03-12T00:33:50Z
publishDate 2020-08-01
publisher Wiley
record_format Article
series IET Computer Vision
spelling doaj.art-11664c68d22c4587af430aa4e1c37d142023-09-15T10:06:15ZengWileyIET Computer Vision1751-96321751-96402020-08-0114517718410.1049/iet-cvi.2019.0751Skeleton‐based attention‐aware spatial–temporal model for action detection and recognitionRan Cui0Aichun Zhu1Jingran Wu2Gang Hua3School of Information and Control EngineeringChina University of Mining and TechnologyXuzhou221008People's Republic of ChinaSchool of Computer Science and TechnologyNanjing Tech UniversityNanjing211800People's Republic of ChinaDepartment of Information and Electrical EngineeringXuhai College, China University of Mining and TechnologyXuzhou221008People's Republic of ChinaSchool of Information and Control EngineeringChina University of Mining and TechnologyXuzhou221008People's Republic of ChinaAction detection and recognition are popular subjects of research in the field of computer vision. The task of action detection can be regarded as the sum of action location and recognition. Action features described by using information concerning the human skeleton have the advantages of robustness against external factors and requiring a small amount of calculation. This study proposes a skeleton‐based action analysis model based on a recurrent neural network framework. The model learns action features by modelling static and dynamic features of skeleton joints and the importance of different video frames by introducing an attention module. For action location, conditional random field loss function is introduced to establish the context dependency of output labels. In the aspect of action recognition, the hierarchical training mechanism with triple loss models action features at coarse‐grained and fine‐grained levels. The authors’ proposed method delivers state‐of‐the‐art results on action location and recognition tasks.https://doi.org/10.1049/iet-cvi.2019.0751human skeletonskeleton-based action analysis modelaction featuresstatic featuresdynamic featuresskeleton joints
spellingShingle Ran Cui
Aichun Zhu
Jingran Wu
Gang Hua
Skeleton‐based attention‐aware spatial–temporal model for action detection and recognition
IET Computer Vision
human skeleton
skeleton-based action analysis model
action features
static features
dynamic features
skeleton joints
title Skeleton‐based attention‐aware spatial–temporal model for action detection and recognition
title_full Skeleton‐based attention‐aware spatial–temporal model for action detection and recognition
title_fullStr Skeleton‐based attention‐aware spatial–temporal model for action detection and recognition
title_full_unstemmed Skeleton‐based attention‐aware spatial–temporal model for action detection and recognition
title_short Skeleton‐based attention‐aware spatial–temporal model for action detection and recognition
title_sort skeleton based attention aware spatial temporal model for action detection and recognition
topic human skeleton
skeleton-based action analysis model
action features
static features
dynamic features
skeleton joints
url https://doi.org/10.1049/iet-cvi.2019.0751
work_keys_str_mv AT rancui skeletonbasedattentionawarespatialtemporalmodelforactiondetectionandrecognition
AT aichunzhu skeletonbasedattentionawarespatialtemporalmodelforactiondetectionandrecognition
AT jingranwu skeletonbasedattentionawarespatialtemporalmodelforactiondetectionandrecognition
AT ganghua skeletonbasedattentionawarespatialtemporalmodelforactiondetectionandrecognition