Feature Fusion of Deep Spatial Features and Handcrafted Spatiotemporal Features for Human Action Recognition

Human action recognition plays a significant part in the research community due to its emerging applications. A variety of approaches have been proposed to resolve this problem, however, several issues still need to be addressed. In action recognition, effectively extracting and aggregating the spat...

Full description

Bibliographic Details
Main Authors:	Md Azher Uddin, Young-Koo Lee
Format:	Article
Language:	English
Published:	MDPI AG 2019-04-01
Series:	Sensors
Subjects:	deep spatial features spatiotemporal features Inception-Resnet-v2 Weber’s law based volume local gradient ternary pattern
Online Access:	https://www.mdpi.com/1424-8220/19/7/1599

_version_	1811279910071697408
author	Md Azher Uddin Young-Koo Lee
author_facet	Md Azher Uddin Young-Koo Lee
author_sort	Md Azher Uddin
collection	DOAJ
description	Human action recognition plays a significant part in the research community due to its emerging applications. A variety of approaches have been proposed to resolve this problem, however, several issues still need to be addressed. In action recognition, effectively extracting and aggregating the spatial-temporal information plays a vital role to describe a video. In this research, we propose a novel approach to recognize human actions by considering both deep spatial features and handcrafted spatiotemporal features. Firstly, we extract the deep spatial features by employing a state-of-the-art deep convolutional network, namely Inception-Resnet-v2. Secondly, we introduce a novel handcrafted feature descriptor, namely Weber’s law based Volume Local Gradient Ternary Pattern (WVLGTP), which brings out the spatiotemporal features. It also considers the shape information by using gradient operation. Furthermore, Weber’s law based threshold value and the ternary pattern based on an adaptive local threshold is presented to effectively handle the noisy center pixel value. Besides, a multi-resolution approach for WVLGTP based on an averaging scheme is also presented. Afterward, both these extracted features are concatenated and feed to the Support Vector Machine to perform the classification. Lastly, the extensive experimental analysis shows that our proposed method outperforms state-of-the-art approaches in terms of accuracy.
first_indexed	2024-04-13T01:04:21Z
format	Article
id	doaj.art-797193ed164e41e3ab81d892c2ea89f6
institution	Directory Open Access Journal
issn	1424-8220
language	English
last_indexed	2024-04-13T01:04:21Z
publishDate	2019-04-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj.art-797193ed164e41e3ab81d892c2ea89f62022-12-22T03:09:23ZengMDPI AGSensors1424-82202019-04-01197159910.3390/s19071599s19071599Feature Fusion of Deep Spatial Features and Handcrafted Spatiotemporal Features for Human Action RecognitionMd Azher Uddin0Young-Koo Lee1Department of Computer Science and Engineering, Kyung Hee University, Global Campus, Yongin 17104, KoreaDepartment of Computer Science and Engineering, Kyung Hee University, Global Campus, Yongin 17104, KoreaHuman action recognition plays a significant part in the research community due to its emerging applications. A variety of approaches have been proposed to resolve this problem, however, several issues still need to be addressed. In action recognition, effectively extracting and aggregating the spatial-temporal information plays a vital role to describe a video. In this research, we propose a novel approach to recognize human actions by considering both deep spatial features and handcrafted spatiotemporal features. Firstly, we extract the deep spatial features by employing a state-of-the-art deep convolutional network, namely Inception-Resnet-v2. Secondly, we introduce a novel handcrafted feature descriptor, namely Weber’s law based Volume Local Gradient Ternary Pattern (WVLGTP), which brings out the spatiotemporal features. It also considers the shape information by using gradient operation. Furthermore, Weber’s law based threshold value and the ternary pattern based on an adaptive local threshold is presented to effectively handle the noisy center pixel value. Besides, a multi-resolution approach for WVLGTP based on an averaging scheme is also presented. Afterward, both these extracted features are concatenated and feed to the Support Vector Machine to perform the classification. Lastly, the extensive experimental analysis shows that our proposed method outperforms state-of-the-art approaches in terms of accuracy.https://www.mdpi.com/1424-8220/19/7/1599deep spatial featuresspatiotemporal featuresInception-Resnet-v2Weber’s law based volume local gradient ternary pattern
spellingShingle	Md Azher Uddin Young-Koo Lee Feature Fusion of Deep Spatial Features and Handcrafted Spatiotemporal Features for Human Action Recognition Sensors deep spatial features spatiotemporal features Inception-Resnet-v2 Weber’s law based volume local gradient ternary pattern
title	Feature Fusion of Deep Spatial Features and Handcrafted Spatiotemporal Features for Human Action Recognition
title_full	Feature Fusion of Deep Spatial Features and Handcrafted Spatiotemporal Features for Human Action Recognition
title_fullStr	Feature Fusion of Deep Spatial Features and Handcrafted Spatiotemporal Features for Human Action Recognition
title_full_unstemmed	Feature Fusion of Deep Spatial Features and Handcrafted Spatiotemporal Features for Human Action Recognition
title_short	Feature Fusion of Deep Spatial Features and Handcrafted Spatiotemporal Features for Human Action Recognition
title_sort	feature fusion of deep spatial features and handcrafted spatiotemporal features for human action recognition
topic	deep spatial features spatiotemporal features Inception-Resnet-v2 Weber’s law based volume local gradient ternary pattern
url	https://www.mdpi.com/1424-8220/19/7/1599
work_keys_str_mv	AT mdazheruddin featurefusionofdeepspatialfeaturesandhandcraftedspatiotemporalfeaturesforhumanactionrecognition AT youngkoolee featurefusionofdeepspatialfeaturesandhandcraftedspatiotemporalfeaturesforhumanactionrecognition

Feature Fusion of Deep Spatial Features and Handcrafted Spatiotemporal Features for Human Action Recognition

Similar Items