Skeleton‐based attention‐aware spatial–temporal model for action detection and recognition
Action detection and recognition are popular subjects of research in the field of computer vision. The task of action detection can be regarded as the sum of action location and recognition. Action features described by using information concerning the human skeleton have the advantages of robustnes...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2020-08-01
|
Series: | IET Computer Vision |
Subjects: | |
Online Access: | https://doi.org/10.1049/iet-cvi.2019.0751 |
_version_ | 1797684721025548288 |
---|---|
author | Ran Cui Aichun Zhu Jingran Wu Gang Hua |
author_facet | Ran Cui Aichun Zhu Jingran Wu Gang Hua |
author_sort | Ran Cui |
collection | DOAJ |
description | Action detection and recognition are popular subjects of research in the field of computer vision. The task of action detection can be regarded as the sum of action location and recognition. Action features described by using information concerning the human skeleton have the advantages of robustness against external factors and requiring a small amount of calculation. This study proposes a skeleton‐based action analysis model based on a recurrent neural network framework. The model learns action features by modelling static and dynamic features of skeleton joints and the importance of different video frames by introducing an attention module. For action location, conditional random field loss function is introduced to establish the context dependency of output labels. In the aspect of action recognition, the hierarchical training mechanism with triple loss models action features at coarse‐grained and fine‐grained levels. The authors’ proposed method delivers state‐of‐the‐art results on action location and recognition tasks. |
first_indexed | 2024-03-12T00:33:50Z |
format | Article |
id | doaj.art-11664c68d22c4587af430aa4e1c37d14 |
institution | Directory Open Access Journal |
issn | 1751-9632 1751-9640 |
language | English |
last_indexed | 2024-03-12T00:33:50Z |
publishDate | 2020-08-01 |
publisher | Wiley |
record_format | Article |
series | IET Computer Vision |
spelling | doaj.art-11664c68d22c4587af430aa4e1c37d142023-09-15T10:06:15ZengWileyIET Computer Vision1751-96321751-96402020-08-0114517718410.1049/iet-cvi.2019.0751Skeleton‐based attention‐aware spatial–temporal model for action detection and recognitionRan Cui0Aichun Zhu1Jingran Wu2Gang Hua3School of Information and Control EngineeringChina University of Mining and TechnologyXuzhou221008People's Republic of ChinaSchool of Computer Science and TechnologyNanjing Tech UniversityNanjing211800People's Republic of ChinaDepartment of Information and Electrical EngineeringXuhai College, China University of Mining and TechnologyXuzhou221008People's Republic of ChinaSchool of Information and Control EngineeringChina University of Mining and TechnologyXuzhou221008People's Republic of ChinaAction detection and recognition are popular subjects of research in the field of computer vision. The task of action detection can be regarded as the sum of action location and recognition. Action features described by using information concerning the human skeleton have the advantages of robustness against external factors and requiring a small amount of calculation. This study proposes a skeleton‐based action analysis model based on a recurrent neural network framework. The model learns action features by modelling static and dynamic features of skeleton joints and the importance of different video frames by introducing an attention module. For action location, conditional random field loss function is introduced to establish the context dependency of output labels. In the aspect of action recognition, the hierarchical training mechanism with triple loss models action features at coarse‐grained and fine‐grained levels. The authors’ proposed method delivers state‐of‐the‐art results on action location and recognition tasks.https://doi.org/10.1049/iet-cvi.2019.0751human skeletonskeleton-based action analysis modelaction featuresstatic featuresdynamic featuresskeleton joints |
spellingShingle | Ran Cui Aichun Zhu Jingran Wu Gang Hua Skeleton‐based attention‐aware spatial–temporal model for action detection and recognition IET Computer Vision human skeleton skeleton-based action analysis model action features static features dynamic features skeleton joints |
title | Skeleton‐based attention‐aware spatial–temporal model for action detection and recognition |
title_full | Skeleton‐based attention‐aware spatial–temporal model for action detection and recognition |
title_fullStr | Skeleton‐based attention‐aware spatial–temporal model for action detection and recognition |
title_full_unstemmed | Skeleton‐based attention‐aware spatial–temporal model for action detection and recognition |
title_short | Skeleton‐based attention‐aware spatial–temporal model for action detection and recognition |
title_sort | skeleton based attention aware spatial temporal model for action detection and recognition |
topic | human skeleton skeleton-based action analysis model action features static features dynamic features skeleton joints |
url | https://doi.org/10.1049/iet-cvi.2019.0751 |
work_keys_str_mv | AT rancui skeletonbasedattentionawarespatialtemporalmodelforactiondetectionandrecognition AT aichunzhu skeletonbasedattentionawarespatialtemporalmodelforactiondetectionandrecognition AT jingranwu skeletonbasedattentionawarespatialtemporalmodelforactiondetectionandrecognition AT ganghua skeletonbasedattentionawarespatialtemporalmodelforactiondetectionandrecognition |