Feature Fusion of Deep Spatial Features and Handcrafted Spatiotemporal Features for Human Action Recognition
Human action recognition plays a significant part in the research community due to its emerging applications. A variety of approaches have been proposed to resolve this problem, however, several issues still need to be addressed. In action recognition, effectively extracting and aggregating the spat...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2019-04-01
|
Series: | Sensors |
Subjects: | |
Online Access: | https://www.mdpi.com/1424-8220/19/7/1599 |
_version_ | 1811279910071697408 |
---|---|
author | Md Azher Uddin Young-Koo Lee |
author_facet | Md Azher Uddin Young-Koo Lee |
author_sort | Md Azher Uddin |
collection | DOAJ |
description | Human action recognition plays a significant part in the research community due to its emerging applications. A variety of approaches have been proposed to resolve this problem, however, several issues still need to be addressed. In action recognition, effectively extracting and aggregating the spatial-temporal information plays a vital role to describe a video. In this research, we propose a novel approach to recognize human actions by considering both deep spatial features and handcrafted spatiotemporal features. Firstly, we extract the deep spatial features by employing a state-of-the-art deep convolutional network, namely Inception-Resnet-v2. Secondly, we introduce a novel handcrafted feature descriptor, namely Weber’s law based Volume Local Gradient Ternary Pattern (WVLGTP), which brings out the spatiotemporal features. It also considers the shape information by using gradient operation. Furthermore, Weber’s law based threshold value and the ternary pattern based on an adaptive local threshold is presented to effectively handle the noisy center pixel value. Besides, a multi-resolution approach for WVLGTP based on an averaging scheme is also presented. Afterward, both these extracted features are concatenated and feed to the Support Vector Machine to perform the classification. Lastly, the extensive experimental analysis shows that our proposed method outperforms state-of-the-art approaches in terms of accuracy. |
first_indexed | 2024-04-13T01:04:21Z |
format | Article |
id | doaj.art-797193ed164e41e3ab81d892c2ea89f6 |
institution | Directory Open Access Journal |
issn | 1424-8220 |
language | English |
last_indexed | 2024-04-13T01:04:21Z |
publishDate | 2019-04-01 |
publisher | MDPI AG |
record_format | Article |
series | Sensors |
spelling | doaj.art-797193ed164e41e3ab81d892c2ea89f62022-12-22T03:09:23ZengMDPI AGSensors1424-82202019-04-01197159910.3390/s19071599s19071599Feature Fusion of Deep Spatial Features and Handcrafted Spatiotemporal Features for Human Action RecognitionMd Azher Uddin0Young-Koo Lee1Department of Computer Science and Engineering, Kyung Hee University, Global Campus, Yongin 17104, KoreaDepartment of Computer Science and Engineering, Kyung Hee University, Global Campus, Yongin 17104, KoreaHuman action recognition plays a significant part in the research community due to its emerging applications. A variety of approaches have been proposed to resolve this problem, however, several issues still need to be addressed. In action recognition, effectively extracting and aggregating the spatial-temporal information plays a vital role to describe a video. In this research, we propose a novel approach to recognize human actions by considering both deep spatial features and handcrafted spatiotemporal features. Firstly, we extract the deep spatial features by employing a state-of-the-art deep convolutional network, namely Inception-Resnet-v2. Secondly, we introduce a novel handcrafted feature descriptor, namely Weber’s law based Volume Local Gradient Ternary Pattern (WVLGTP), which brings out the spatiotemporal features. It also considers the shape information by using gradient operation. Furthermore, Weber’s law based threshold value and the ternary pattern based on an adaptive local threshold is presented to effectively handle the noisy center pixel value. Besides, a multi-resolution approach for WVLGTP based on an averaging scheme is also presented. Afterward, both these extracted features are concatenated and feed to the Support Vector Machine to perform the classification. Lastly, the extensive experimental analysis shows that our proposed method outperforms state-of-the-art approaches in terms of accuracy.https://www.mdpi.com/1424-8220/19/7/1599deep spatial featuresspatiotemporal featuresInception-Resnet-v2Weber’s law based volume local gradient ternary pattern |
spellingShingle | Md Azher Uddin Young-Koo Lee Feature Fusion of Deep Spatial Features and Handcrafted Spatiotemporal Features for Human Action Recognition Sensors deep spatial features spatiotemporal features Inception-Resnet-v2 Weber’s law based volume local gradient ternary pattern |
title | Feature Fusion of Deep Spatial Features and Handcrafted Spatiotemporal Features for Human Action Recognition |
title_full | Feature Fusion of Deep Spatial Features and Handcrafted Spatiotemporal Features for Human Action Recognition |
title_fullStr | Feature Fusion of Deep Spatial Features and Handcrafted Spatiotemporal Features for Human Action Recognition |
title_full_unstemmed | Feature Fusion of Deep Spatial Features and Handcrafted Spatiotemporal Features for Human Action Recognition |
title_short | Feature Fusion of Deep Spatial Features and Handcrafted Spatiotemporal Features for Human Action Recognition |
title_sort | feature fusion of deep spatial features and handcrafted spatiotemporal features for human action recognition |
topic | deep spatial features spatiotemporal features Inception-Resnet-v2 Weber’s law based volume local gradient ternary pattern |
url | https://www.mdpi.com/1424-8220/19/7/1599 |
work_keys_str_mv | AT mdazheruddin featurefusionofdeepspatialfeaturesandhandcraftedspatiotemporalfeaturesforhumanactionrecognition AT youngkoolee featurefusionofdeepspatialfeaturesandhandcraftedspatiotemporalfeaturesforhumanactionrecognition |