Action recognition based on 2D skeletons extracted from RGB videos

In this paper a methodology to recognize actions based on RGB videos is proposed which takes advantages of the recent breakthrough made in deep learning. Following the development of Convolutional Neural Networks (CNNs), research was conducted on the transformation of skeletal motion data into 2D im...

Full description

Bibliographic Details
Main Authors:	Aubry Sophie, Laraba Sohaib, Tilmanne Joëlle, Dutoit Thierry
Format:	Article
Language:	English
Published:	EDP Sciences 2019-01-01
Series:	MATEC Web of Conferences
Online Access:	https://www.matec-conferences.org/articles/matecconf/pdf/2019/26/matecconf_jcmme2018_02034.pdf

_version_	1818451012179984384
author	Aubry Sophie Laraba Sohaib Tilmanne Joëlle Dutoit Thierry
author_facet	Aubry Sophie Laraba Sohaib Tilmanne Joëlle Dutoit Thierry
author_sort	Aubry Sophie
collection	DOAJ
description	In this paper a methodology to recognize actions based on RGB videos is proposed which takes advantages of the recent breakthrough made in deep learning. Following the development of Convolutional Neural Networks (CNNs), research was conducted on the transformation of skeletal motion data into 2D images. In this work, a solution is proposed requiring only the use of RGB videos instead of RGB-D videos. This work is based on multiple works studying the conversion of RGB-D data into 2D images. From a video stream (RGB images), a two-dimension skeleton of 18 joints for each detected body is extracted with a DNN-based human pose estimator called OpenPose. The skeleton data are encoded into Red, Green and Blue channels of images. Different ways of encoding motion data into images were studied. We successfully use state-of-the-art deep neural networks designed for image classification to recognize actions. Based on a study of the related works, we chose to use image classification models: SqueezeNet, AlexNet, DenseNet, ResNet, Inception, VGG and retrained them to perform action recognition. For all the test the NTU RGB+D database is used. The highest accuracy is obtained with ResNet: 83.317% cross-subject and 88.780% cross-view which outperforms most of state-of-the-art results.
first_indexed	2024-12-14T21:00:25Z
format	Article
id	doaj.art-0ea6ecf764244ec58041df88fbaac2a7
institution	Directory Open Access Journal
issn	2261-236X
language	English
last_indexed	2024-12-14T21:00:25Z
publishDate	2019-01-01
publisher	EDP Sciences
record_format	Article
series	MATEC Web of Conferences
spelling	doaj.art-0ea6ecf764244ec58041df88fbaac2a72022-12-21T22:47:35ZengEDP SciencesMATEC Web of Conferences2261-236X2019-01-012770203410.1051/matecconf/201927702034matecconf_jcmme2018_02034Action recognition based on 2D skeletons extracted from RGB videosAubry SophieLaraba SohaibTilmanne JoëlleDutoit ThierryIn this paper a methodology to recognize actions based on RGB videos is proposed which takes advantages of the recent breakthrough made in deep learning. Following the development of Convolutional Neural Networks (CNNs), research was conducted on the transformation of skeletal motion data into 2D images. In this work, a solution is proposed requiring only the use of RGB videos instead of RGB-D videos. This work is based on multiple works studying the conversion of RGB-D data into 2D images. From a video stream (RGB images), a two-dimension skeleton of 18 joints for each detected body is extracted with a DNN-based human pose estimator called OpenPose. The skeleton data are encoded into Red, Green and Blue channels of images. Different ways of encoding motion data into images were studied. We successfully use state-of-the-art deep neural networks designed for image classification to recognize actions. Based on a study of the related works, we chose to use image classification models: SqueezeNet, AlexNet, DenseNet, ResNet, Inception, VGG and retrained them to perform action recognition. For all the test the NTU RGB+D database is used. The highest accuracy is obtained with ResNet: 83.317% cross-subject and 88.780% cross-view which outperforms most of state-of-the-art results.https://www.matec-conferences.org/articles/matecconf/pdf/2019/26/matecconf_jcmme2018_02034.pdf
spellingShingle	Aubry Sophie Laraba Sohaib Tilmanne Joëlle Dutoit Thierry Action recognition based on 2D skeletons extracted from RGB videos MATEC Web of Conferences
title	Action recognition based on 2D skeletons extracted from RGB videos
title_full	Action recognition based on 2D skeletons extracted from RGB videos
title_fullStr	Action recognition based on 2D skeletons extracted from RGB videos
title_full_unstemmed	Action recognition based on 2D skeletons extracted from RGB videos
title_short	Action recognition based on 2D skeletons extracted from RGB videos
title_sort	action recognition based on 2d skeletons extracted from rgb videos
url	https://www.matec-conferences.org/articles/matecconf/pdf/2019/26/matecconf_jcmme2018_02034.pdf
work_keys_str_mv	AT aubrysophie actionrecognitionbasedon2dskeletonsextractedfromrgbvideos AT larabasohaib actionrecognitionbasedon2dskeletonsextractedfromrgbvideos AT tilmannejoelle actionrecognitionbasedon2dskeletonsextractedfromrgbvideos AT dutoitthierry actionrecognitionbasedon2dskeletonsextractedfromrgbvideos

Action recognition based on 2D skeletons extracted from RGB videos

Similar Items