Feature Fusion of Deep Spatial Features and Handcrafted Spatiotemporal Features for Human Action Recognition

Human action recognition plays a significant part in the research community due to its emerging applications. A variety of approaches have been proposed to resolve this problem, however, several issues still need to be addressed. In action recognition, effectively extracting and aggregating the spat...

Full description

Bibliographic Details
Main Authors: Md Azher Uddin, Young-Koo Lee
Format: Article
Language:English
Published: MDPI AG 2019-04-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/19/7/1599
_version_ 1811279910071697408
author Md Azher Uddin
Young-Koo Lee
author_facet Md Azher Uddin
Young-Koo Lee
author_sort Md Azher Uddin
collection DOAJ
description Human action recognition plays a significant part in the research community due to its emerging applications. A variety of approaches have been proposed to resolve this problem, however, several issues still need to be addressed. In action recognition, effectively extracting and aggregating the spatial-temporal information plays a vital role to describe a video. In this research, we propose a novel approach to recognize human actions by considering both deep spatial features and handcrafted spatiotemporal features. Firstly, we extract the deep spatial features by employing a state-of-the-art deep convolutional network, namely Inception-Resnet-v2. Secondly, we introduce a novel handcrafted feature descriptor, namely Weber’s law based Volume Local Gradient Ternary Pattern (WVLGTP), which brings out the spatiotemporal features. It also considers the shape information by using gradient operation. Furthermore, Weber’s law based threshold value and the ternary pattern based on an adaptive local threshold is presented to effectively handle the noisy center pixel value. Besides, a multi-resolution approach for WVLGTP based on an averaging scheme is also presented. Afterward, both these extracted features are concatenated and feed to the Support Vector Machine to perform the classification. Lastly, the extensive experimental analysis shows that our proposed method outperforms state-of-the-art approaches in terms of accuracy.
first_indexed 2024-04-13T01:04:21Z
format Article
id doaj.art-797193ed164e41e3ab81d892c2ea89f6
institution Directory Open Access Journal
issn 1424-8220
language English
last_indexed 2024-04-13T01:04:21Z
publishDate 2019-04-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj.art-797193ed164e41e3ab81d892c2ea89f62022-12-22T03:09:23ZengMDPI AGSensors1424-82202019-04-01197159910.3390/s19071599s19071599Feature Fusion of Deep Spatial Features and Handcrafted Spatiotemporal Features for Human Action RecognitionMd Azher Uddin0Young-Koo Lee1Department of Computer Science and Engineering, Kyung Hee University, Global Campus, Yongin 17104, KoreaDepartment of Computer Science and Engineering, Kyung Hee University, Global Campus, Yongin 17104, KoreaHuman action recognition plays a significant part in the research community due to its emerging applications. A variety of approaches have been proposed to resolve this problem, however, several issues still need to be addressed. In action recognition, effectively extracting and aggregating the spatial-temporal information plays a vital role to describe a video. In this research, we propose a novel approach to recognize human actions by considering both deep spatial features and handcrafted spatiotemporal features. Firstly, we extract the deep spatial features by employing a state-of-the-art deep convolutional network, namely Inception-Resnet-v2. Secondly, we introduce a novel handcrafted feature descriptor, namely Weber’s law based Volume Local Gradient Ternary Pattern (WVLGTP), which brings out the spatiotemporal features. It also considers the shape information by using gradient operation. Furthermore, Weber’s law based threshold value and the ternary pattern based on an adaptive local threshold is presented to effectively handle the noisy center pixel value. Besides, a multi-resolution approach for WVLGTP based on an averaging scheme is also presented. Afterward, both these extracted features are concatenated and feed to the Support Vector Machine to perform the classification. Lastly, the extensive experimental analysis shows that our proposed method outperforms state-of-the-art approaches in terms of accuracy.https://www.mdpi.com/1424-8220/19/7/1599deep spatial featuresspatiotemporal featuresInception-Resnet-v2Weber’s law based volume local gradient ternary pattern
spellingShingle Md Azher Uddin
Young-Koo Lee
Feature Fusion of Deep Spatial Features and Handcrafted Spatiotemporal Features for Human Action Recognition
Sensors
deep spatial features
spatiotemporal features
Inception-Resnet-v2
Weber’s law based volume local gradient ternary pattern
title Feature Fusion of Deep Spatial Features and Handcrafted Spatiotemporal Features for Human Action Recognition
title_full Feature Fusion of Deep Spatial Features and Handcrafted Spatiotemporal Features for Human Action Recognition
title_fullStr Feature Fusion of Deep Spatial Features and Handcrafted Spatiotemporal Features for Human Action Recognition
title_full_unstemmed Feature Fusion of Deep Spatial Features and Handcrafted Spatiotemporal Features for Human Action Recognition
title_short Feature Fusion of Deep Spatial Features and Handcrafted Spatiotemporal Features for Human Action Recognition
title_sort feature fusion of deep spatial features and handcrafted spatiotemporal features for human action recognition
topic deep spatial features
spatiotemporal features
Inception-Resnet-v2
Weber’s law based volume local gradient ternary pattern
url https://www.mdpi.com/1424-8220/19/7/1599
work_keys_str_mv AT mdazheruddin featurefusionofdeepspatialfeaturesandhandcraftedspatiotemporalfeaturesforhumanactionrecognition
AT youngkoolee featurefusionofdeepspatialfeaturesandhandcraftedspatiotemporalfeaturesforhumanactionrecognition