A Real-Time 3-Dimensional Object Detection Based Human Action Recognition Model

Computer vision technologies have greatly improved in the last few years. Many problems have been solved using deep learning merged with more computational power. Action recognition is one of society's problems that must be addressed. Human Action Recognition (HAR) may be adopted for inte...

Full description

Bibliographic Details
Main Authors: Chhaya Gupta, Nasib Singh Gill, Preeti Gulia, Sangeeta Yadav, Giovanni Pau, Mohammad Alibakhshikenari, Xiangjie Kong
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Open Journal of the Computer Society
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10323158/
_version_ 1797373149908566016
author Chhaya Gupta
Nasib Singh Gill
Preeti Gulia
Sangeeta Yadav
Giovanni Pau
Mohammad Alibakhshikenari
Xiangjie Kong
author_facet Chhaya Gupta
Nasib Singh Gill
Preeti Gulia
Sangeeta Yadav
Giovanni Pau
Mohammad Alibakhshikenari
Xiangjie Kong
author_sort Chhaya Gupta
collection DOAJ
description Computer vision technologies have greatly improved in the last few years. Many problems have been solved using deep learning merged with more computational power. Action recognition is one of society's problems that must be addressed. Human Action Recognition (HAR) may be adopted for intelligent video surveillance systems, and the government may use the same for monitoring crimes and security purposes. This paper proposes a deep learning-based HAR model, i.e., a 3-dimensional Convolutional Network with multiplicative LSTM. The suggested model makes it easier to comprehend the tasks that an individual or team of individuals completes. The four-phase proposed model consists of a 3D Convolutional neural network (3DCNN) combined with an LSTM multiplicative recurrent network and Yolov6 for real-time object detection. The four stages of the proposed model are data fusion, feature extraction, object identification, and skeleton articulation approaches. The NTU-RGB-D, KITTI, NTU-RGB-D 120, UCF 101, and Fused datasets are some used to train the model. The suggested model surpasses other cutting-edge models by reaching an accuracy of 98.23%, 97.65%, 98.76%, 95.45%, and 97.65% on the abovementioned datasets. Other state-of-the-art (SOTA) methods compared in this study are traditional CNN, Yolov6, and CNN with BiLSTM. The results verify that actions are classified more accurately by the proposed model that combines all these techniques compared to existing ones.
first_indexed 2024-03-08T18:46:10Z
format Article
id doaj.art-8b26e7c3c97342e7b9b5c0b04b6ce85b
institution Directory Open Access Journal
issn 2644-1268
language English
last_indexed 2024-03-08T18:46:10Z
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Open Journal of the Computer Society
spelling doaj.art-8b26e7c3c97342e7b9b5c0b04b6ce85b2023-12-29T00:04:06ZengIEEEIEEE Open Journal of the Computer Society2644-12682024-01-015142610.1109/OJCS.2023.333452810323158A Real-Time 3-Dimensional Object Detection Based Human Action Recognition ModelChhaya Gupta0https://orcid.org/0000-0002-8620-2927Nasib Singh Gill1https://orcid.org/0000-0002-8594-4320Preeti Gulia2https://orcid.org/0000-0001-8535-4016Sangeeta Yadav3https://orcid.org/0000-0003-2625-8096Giovanni Pau4https://orcid.org/0000-0002-5798-398XMohammad Alibakhshikenari5https://orcid.org/0000-0002-8263-1572Xiangjie Kong6https://orcid.org/0000-0003-2698-3319Department of Computer Science and Applications, Maharshi Dayanand University, Rohtak, IndiaDepartment of Computer Science and Applications, Maharshi Dayanand University, Rohtak, IndiaDepartment of Computer Science and Applications, Maharshi Dayanand University, Rohtak, IndiaDepartment of Computer Science and Applications, Maharshi Dayanand University, Rohtak, IndiaFaculty of Engineering and Architecture, Kore University, Enna, ItalyDepartment of Signal Theory Communications, Universidad Carlos III Madrid, Getafe, SpainCollege of Computer Science and Technology, Zhejiang University, Hangzhou, ChinaComputer vision technologies have greatly improved in the last few years. Many problems have been solved using deep learning merged with more computational power. Action recognition is one of society's problems that must be addressed. Human Action Recognition (HAR) may be adopted for intelligent video surveillance systems, and the government may use the same for monitoring crimes and security purposes. This paper proposes a deep learning-based HAR model, i.e., a 3-dimensional Convolutional Network with multiplicative LSTM. The suggested model makes it easier to comprehend the tasks that an individual or team of individuals completes. The four-phase proposed model consists of a 3D Convolutional neural network (3DCNN) combined with an LSTM multiplicative recurrent network and Yolov6 for real-time object detection. The four stages of the proposed model are data fusion, feature extraction, object identification, and skeleton articulation approaches. The NTU-RGB-D, KITTI, NTU-RGB-D 120, UCF 101, and Fused datasets are some used to train the model. The suggested model surpasses other cutting-edge models by reaching an accuracy of 98.23%, 97.65%, 98.76%, 95.45%, and 97.65% on the abovementioned datasets. Other state-of-the-art (SOTA) methods compared in this study are traditional CNN, Yolov6, and CNN with BiLSTM. The results verify that actions are classified more accurately by the proposed model that combines all these techniques compared to existing ones.https://ieeexplore.ieee.org/document/10323158/CNNfeature extractionhuman action recognitionmultiplicative LSTMskeleton articulation
spellingShingle Chhaya Gupta
Nasib Singh Gill
Preeti Gulia
Sangeeta Yadav
Giovanni Pau
Mohammad Alibakhshikenari
Xiangjie Kong
A Real-Time 3-Dimensional Object Detection Based Human Action Recognition Model
IEEE Open Journal of the Computer Society
CNN
feature extraction
human action recognition
multiplicative LSTM
skeleton articulation
title A Real-Time 3-Dimensional Object Detection Based Human Action Recognition Model
title_full A Real-Time 3-Dimensional Object Detection Based Human Action Recognition Model
title_fullStr A Real-Time 3-Dimensional Object Detection Based Human Action Recognition Model
title_full_unstemmed A Real-Time 3-Dimensional Object Detection Based Human Action Recognition Model
title_short A Real-Time 3-Dimensional Object Detection Based Human Action Recognition Model
title_sort real time 3 dimensional object detection based human action recognition model
topic CNN
feature extraction
human action recognition
multiplicative LSTM
skeleton articulation
url https://ieeexplore.ieee.org/document/10323158/
work_keys_str_mv AT chhayagupta arealtime3dimensionalobjectdetectionbasedhumanactionrecognitionmodel
AT nasibsinghgill arealtime3dimensionalobjectdetectionbasedhumanactionrecognitionmodel
AT preetigulia arealtime3dimensionalobjectdetectionbasedhumanactionrecognitionmodel
AT sangeetayadav arealtime3dimensionalobjectdetectionbasedhumanactionrecognitionmodel
AT giovannipau arealtime3dimensionalobjectdetectionbasedhumanactionrecognitionmodel
AT mohammadalibakhshikenari arealtime3dimensionalobjectdetectionbasedhumanactionrecognitionmodel
AT xiangjiekong arealtime3dimensionalobjectdetectionbasedhumanactionrecognitionmodel
AT chhayagupta realtime3dimensionalobjectdetectionbasedhumanactionrecognitionmodel
AT nasibsinghgill realtime3dimensionalobjectdetectionbasedhumanactionrecognitionmodel
AT preetigulia realtime3dimensionalobjectdetectionbasedhumanactionrecognitionmodel
AT sangeetayadav realtime3dimensionalobjectdetectionbasedhumanactionrecognitionmodel
AT giovannipau realtime3dimensionalobjectdetectionbasedhumanactionrecognitionmodel
AT mohammadalibakhshikenari realtime3dimensionalobjectdetectionbasedhumanactionrecognitionmodel
AT xiangjiekong realtime3dimensionalobjectdetectionbasedhumanactionrecognitionmodel