Human Activity Recognition Using Cascaded Dual Attention CNN and Bi-Directional GRU Framework

Vision-based human activity recognition (HAR) has emerged as one of the essential research areas in video analytics. Over the last decade, numerous advanced deep learning algorithms have been introduced to recognize complex human actions from video streams. These deep learning algorithms have shown...

Full description

Bibliographic Details
Main Authors:	Hayat Ullah, Arslan Munir
Format:	Article
Language:	English
Published:	MDPI AG 2023-06-01
Series:	Journal of Imaging
Subjects:	convolutional neural network channel–spatial attention activity recognition gated recurrent unit pattern recognition deep learning
Online Access:	https://www.mdpi.com/2313-433X/9/7/130

_version_	1797588783372173312
author	Hayat Ullah Arslan Munir
author_facet	Hayat Ullah Arslan Munir
author_sort	Hayat Ullah
collection	DOAJ
description	Vision-based human activity recognition (HAR) has emerged as one of the essential research areas in video analytics. Over the last decade, numerous advanced deep learning algorithms have been introduced to recognize complex human actions from video streams. These deep learning algorithms have shown impressive performance for the video analytics task. However, these newly introduced methods either exclusively focus on model performance or the effectiveness of these models in terms of computational efficiency, resulting in a biased trade-off between robustness and computational efficiency in their proposed methods to deal with challenging HAR problem. To enhance both the accuracy and computational efficiency, this paper presents a computationally efficient yet generic spatial–temporal cascaded framework that exploits the deep discriminative spatial and temporal features for HAR. For efficient representation of human actions, we propose an efficient dual attentional convolutional neural network (DA-CNN) architecture that leverages a unified channel–spatial attention mechanism to extract human-centric salient features in video frames. The dual channel–spatial attention layers together with the convolutional layers learn to be more selective in the spatial receptive fields having objects within the feature maps. The extracted discriminative salient features are then forwarded to a stacked bi-directional gated recurrent unit (Bi-GRU) for long-term temporal modeling and recognition of human actions using both forward and backward pass gradient learning. Extensive experiments are conducted on three publicly available human action datasets, where the obtained results verify the effectiveness of our proposed framework (DA-CNN+Bi-GRU) over the state-of-the-art methods in terms of model accuracy and inference runtime across each dataset. Experimental results show that the DA-CNN+Bi-GRU framework attains an improvement in execution time up to 167× in terms of frames per second as compared to most of the contemporary action-recognition methods.
first_indexed	2024-03-11T00:56:56Z
format	Article
id	doaj.art-9e489813352043858e9537a361069c07
institution	Directory Open Access Journal
issn	2313-433X
language	English
last_indexed	2024-03-11T00:56:56Z
publishDate	2023-06-01
publisher	MDPI AG
record_format	Article
series	Journal of Imaging
spelling	doaj.art-9e489813352043858e9537a361069c072023-11-18T19:56:55ZengMDPI AGJournal of Imaging2313-433X2023-06-019713010.3390/jimaging9070130Human Activity Recognition Using Cascaded Dual Attention CNN and Bi-Directional GRU FrameworkHayat Ullah0Arslan Munir1Department of Computer Science, Kansas State University, Manhattan, KS 66506, USADepartment of Computer Science, Kansas State University, Manhattan, KS 66506, USAVision-based human activity recognition (HAR) has emerged as one of the essential research areas in video analytics. Over the last decade, numerous advanced deep learning algorithms have been introduced to recognize complex human actions from video streams. These deep learning algorithms have shown impressive performance for the video analytics task. However, these newly introduced methods either exclusively focus on model performance or the effectiveness of these models in terms of computational efficiency, resulting in a biased trade-off between robustness and computational efficiency in their proposed methods to deal with challenging HAR problem. To enhance both the accuracy and computational efficiency, this paper presents a computationally efficient yet generic spatial–temporal cascaded framework that exploits the deep discriminative spatial and temporal features for HAR. For efficient representation of human actions, we propose an efficient dual attentional convolutional neural network (DA-CNN) architecture that leverages a unified channel–spatial attention mechanism to extract human-centric salient features in video frames. The dual channel–spatial attention layers together with the convolutional layers learn to be more selective in the spatial receptive fields having objects within the feature maps. The extracted discriminative salient features are then forwarded to a stacked bi-directional gated recurrent unit (Bi-GRU) for long-term temporal modeling and recognition of human actions using both forward and backward pass gradient learning. Extensive experiments are conducted on three publicly available human action datasets, where the obtained results verify the effectiveness of our proposed framework (DA-CNN+Bi-GRU) over the state-of-the-art methods in terms of model accuracy and inference runtime across each dataset. Experimental results show that the DA-CNN+Bi-GRU framework attains an improvement in execution time up to 167× in terms of frames per second as compared to most of the contemporary action-recognition methods.https://www.mdpi.com/2313-433X/9/7/130convolutional neural networkchannel–spatial attentionactivity recognitiongated recurrent unitpattern recognitiondeep learning
spellingShingle	Hayat Ullah Arslan Munir Human Activity Recognition Using Cascaded Dual Attention CNN and Bi-Directional GRU Framework Journal of Imaging convolutional neural network channel–spatial attention activity recognition gated recurrent unit pattern recognition deep learning
title	Human Activity Recognition Using Cascaded Dual Attention CNN and Bi-Directional GRU Framework
title_full	Human Activity Recognition Using Cascaded Dual Attention CNN and Bi-Directional GRU Framework
title_fullStr	Human Activity Recognition Using Cascaded Dual Attention CNN and Bi-Directional GRU Framework
title_full_unstemmed	Human Activity Recognition Using Cascaded Dual Attention CNN and Bi-Directional GRU Framework
title_short	Human Activity Recognition Using Cascaded Dual Attention CNN and Bi-Directional GRU Framework
title_sort	human activity recognition using cascaded dual attention cnn and bi directional gru framework
topic	convolutional neural network channel–spatial attention activity recognition gated recurrent unit pattern recognition deep learning
url	https://www.mdpi.com/2313-433X/9/7/130
work_keys_str_mv	AT hayatullah humanactivityrecognitionusingcascadeddualattentioncnnandbidirectionalgruframework AT arslanmunir humanactivityrecognitionusingcascadeddualattentioncnnandbidirectionalgruframework

Human Activity Recognition Using Cascaded Dual Attention CNN and Bi-Directional GRU Framework

Similar Items