Human Action Recognition by Learning Spatio-Temporal Features with Deep Neural Networks

Human action recognition plays a crucial role in various applications, including video surveillance, human-computer interaction, and activity analysis. This paper presents a study on human action recognition by leveraging CNN-LSTM architecture with an attention model. The proposed approach aims to c...

Full description

Bibliographic Details
Main Authors:	Haindavi P., Sharif Shaik, Lakshman A., Aerranagula Veerender, Reddy P. Chandra Sekhar, Kumar Anuj
Format:	Article
Language:	English
Published:	EDP Sciences 2023-01-01
Series:	E3S Web of Conferences
Subjects:	cnn-lstm deep learning recognize human action
Online Access:	https://www.e3s-conferences.org/articles/e3sconf/pdf/2023/67/e3sconf_icmpc2023_01154.pdf

_version_	1797658689511882752
author	Haindavi P. Sharif Shaik Lakshman A. Aerranagula Veerender Reddy P. Chandra Sekhar Kumar Anuj
author_facet	Haindavi P. Sharif Shaik Lakshman A. Aerranagula Veerender Reddy P. Chandra Sekhar Kumar Anuj
author_sort	Haindavi P.
collection	DOAJ
description	Human action recognition plays a crucial role in various applications, including video surveillance, human-computer interaction, and activity analysis. This paper presents a study on human action recognition by leveraging CNN-LSTM architecture with an attention model. The proposed approach aims to capture both spatial and temporal information from videos in order to recognize human actions. We utilize the UCF-101 and UCF-50 datasets, which are widely used benchmark datasets for action recognition. The UCF-101 dataset consists of 101 action classes, while the UCF-50 dataset comprises 50 action classes, both encompassing diverse human activities. Our CNN-LSTM model integrates a CNN as the feature extractor to capture spatial information from video frames. Subsequently, the extracted features are fed into an LSTM network to capture temporal dependencies and sequence information. To enhance the discriminative power of the model, an attention model is incorporated to improve the activation patterns and highlight relevant features. Furthermore, the study provides insights into the importance of leveraging both spatial and temporal information for accurate action recognition. The findings highlight the efficacy of the CNN-LSTM architecture with an attention model in capturing meaningful patterns in video sequences and improving action recognition accuracy. You should leave 8 mm of space above the abstract and 10 mm after the abstract. The heading Abstract should be typed in bold 9-point Arial. The body of the abstract should be typed in normal 9-point Times in a single paragraph, immediately following the heading. The text should be set to 1 line spacing. The abstract should be centred across the page, indented 17 mm from the left and right page margins and justified. It should not normally exceed 200 words.
first_indexed	2024-03-11T18:02:58Z
format	Article
id	doaj.art-321b7114373d4a3e87f3a3db42812148
institution	Directory Open Access Journal
issn	2267-1242
language	English
last_indexed	2024-03-11T18:02:58Z
publishDate	2023-01-01
publisher	EDP Sciences
record_format	Article
series	E3S Web of Conferences
spelling	doaj.art-321b7114373d4a3e87f3a3db428121482023-10-17T08:47:38ZengEDP SciencesE3S Web of Conferences2267-12422023-01-014300115410.1051/e3sconf/202343001154e3sconf_icmpc2023_01154Human Action Recognition by Learning Spatio-Temporal Features with Deep Neural NetworksHaindavi P.0Sharif Shaik1Lakshman A.2Aerranagula Veerender3Reddy P. Chandra Sekhar4Kumar Anuj5Dept. of CSE- Data Science, KG Reddy College of Engineering & TechnologyDept. of CSE-AI&ML, CMR Technical Campus, KandlakoyaDept. of CSE- Data Science, CMR Technical Campus, KandlakoyaDept. of CSE- Data Science, CMR Technical Campus, KandlakoyaProfessor, Department of Computer Science and Engineering, GRIET, Bachupally,Uttaranchal Instiiute of Technology, Uttaranchal UniversityHuman action recognition plays a crucial role in various applications, including video surveillance, human-computer interaction, and activity analysis. This paper presents a study on human action recognition by leveraging CNN-LSTM architecture with an attention model. The proposed approach aims to capture both spatial and temporal information from videos in order to recognize human actions. We utilize the UCF-101 and UCF-50 datasets, which are widely used benchmark datasets for action recognition. The UCF-101 dataset consists of 101 action classes, while the UCF-50 dataset comprises 50 action classes, both encompassing diverse human activities. Our CNN-LSTM model integrates a CNN as the feature extractor to capture spatial information from video frames. Subsequently, the extracted features are fed into an LSTM network to capture temporal dependencies and sequence information. To enhance the discriminative power of the model, an attention model is incorporated to improve the activation patterns and highlight relevant features. Furthermore, the study provides insights into the importance of leveraging both spatial and temporal information for accurate action recognition. The findings highlight the efficacy of the CNN-LSTM architecture with an attention model in capturing meaningful patterns in video sequences and improving action recognition accuracy. You should leave 8 mm of space above the abstract and 10 mm after the abstract. The heading Abstract should be typed in bold 9-point Arial. The body of the abstract should be typed in normal 9-point Times in a single paragraph, immediately following the heading. The text should be set to 1 line spacing. The abstract should be centred across the page, indented 17 mm from the left and right page margins and justified. It should not normally exceed 200 words.https://www.e3s-conferences.org/articles/e3sconf/pdf/2023/67/e3sconf_icmpc2023_01154.pdfcnn-lstmdeep learningrecognize human action
spellingShingle	Haindavi P. Sharif Shaik Lakshman A. Aerranagula Veerender Reddy P. Chandra Sekhar Kumar Anuj Human Action Recognition by Learning Spatio-Temporal Features with Deep Neural Networks E3S Web of Conferences cnn-lstm deep learning recognize human action
title	Human Action Recognition by Learning Spatio-Temporal Features with Deep Neural Networks
title_full	Human Action Recognition by Learning Spatio-Temporal Features with Deep Neural Networks
title_fullStr	Human Action Recognition by Learning Spatio-Temporal Features with Deep Neural Networks
title_full_unstemmed	Human Action Recognition by Learning Spatio-Temporal Features with Deep Neural Networks
title_short	Human Action Recognition by Learning Spatio-Temporal Features with Deep Neural Networks
title_sort	human action recognition by learning spatio temporal features with deep neural networks
topic	cnn-lstm deep learning recognize human action
url	https://www.e3s-conferences.org/articles/e3sconf/pdf/2023/67/e3sconf_icmpc2023_01154.pdf
work_keys_str_mv	AT haindavip humanactionrecognitionbylearningspatiotemporalfeatureswithdeepneuralnetworks AT sharifshaik humanactionrecognitionbylearningspatiotemporalfeatureswithdeepneuralnetworks AT lakshmana humanactionrecognitionbylearningspatiotemporalfeatureswithdeepneuralnetworks AT aerranagulaveerender humanactionrecognitionbylearningspatiotemporalfeatureswithdeepneuralnetworks AT reddypchandrasekhar humanactionrecognitionbylearningspatiotemporalfeatureswithdeepneuralnetworks AT kumaranuj humanactionrecognitionbylearningspatiotemporalfeatureswithdeepneuralnetworks

Human Action Recognition by Learning Spatio-Temporal Features with Deep Neural Networks

Similar Items