Content-Adaptive and Attention-Based Network for Hand Gesture Recognition

For hand gesture recognition, recurrent neural networks and 3D convolutional neural networks are the most commonly used methods for learning the spatial–temporal features of gestures. The calculation of the hidden state of the recurrent neural network at a specific time is determined by both input a...

Full description

Bibliographic Details
Main Authors: Zongjing Cao, Yan Li, Byeong-Seok Shin
Format: Article
Language:English
Published: MDPI AG 2022-02-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/12/4/2041
_version_ 1797483007726059520
author Zongjing Cao
Yan Li
Byeong-Seok Shin
author_facet Zongjing Cao
Yan Li
Byeong-Seok Shin
author_sort Zongjing Cao
collection DOAJ
description For hand gesture recognition, recurrent neural networks and 3D convolutional neural networks are the most commonly used methods for learning the spatial–temporal features of gestures. The calculation of the hidden state of the recurrent neural network at a specific time is determined by both input at the current time and the output of the hidden state at the previous time, therefore limiting its parallel computation. The large number of weight parameters that need to be optimized leads to high computational costs associated with 3D convolution-based methods. We introduced a transformer-based network for hand gesture recognition, which is a completely self-attentional architecture without any convolution or recurrent layers. The framework classifies hand gestures by focusing on the sequence information of the whole gesture video. In addition, we introduced an adaptive sampling strategy based on the video content to reduce the input of gesture-free frames to the model, thus reducing computational consumption. The proposed network achieved 83.2% and 93.8% recognition accuracy on two publicly available benchmark datasets, NVGesture and EgoGesture datasets, respectively. The results of extensive comparison experiments show that our proposed approach outperforms the existing state-of-the-art gesture recognition systems.
first_indexed 2024-03-09T22:41:44Z
format Article
id doaj.art-8ab3d6b3cd1c44609a092773864f4354
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-09T22:41:44Z
publishDate 2022-02-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-8ab3d6b3cd1c44609a092773864f43542023-11-23T18:38:17ZengMDPI AGApplied Sciences2076-34172022-02-01124204110.3390/app12042041Content-Adaptive and Attention-Based Network for Hand Gesture RecognitionZongjing Cao0Yan Li1Byeong-Seok Shin2Department of Electrical and Computer Engineering, Inha University, Incheon 22212, KoreaDepartment of Electrical and Computer Engineering, Inha University, Incheon 22212, KoreaDepartment of Electrical and Computer Engineering, Inha University, Incheon 22212, KoreaFor hand gesture recognition, recurrent neural networks and 3D convolutional neural networks are the most commonly used methods for learning the spatial–temporal features of gestures. The calculation of the hidden state of the recurrent neural network at a specific time is determined by both input at the current time and the output of the hidden state at the previous time, therefore limiting its parallel computation. The large number of weight parameters that need to be optimized leads to high computational costs associated with 3D convolution-based methods. We introduced a transformer-based network for hand gesture recognition, which is a completely self-attentional architecture without any convolution or recurrent layers. The framework classifies hand gestures by focusing on the sequence information of the whole gesture video. In addition, we introduced an adaptive sampling strategy based on the video content to reduce the input of gesture-free frames to the model, thus reducing computational consumption. The proposed network achieved 83.2% and 93.8% recognition accuracy on two publicly available benchmark datasets, NVGesture and EgoGesture datasets, respectively. The results of extensive comparison experiments show that our proposed approach outperforms the existing state-of-the-art gesture recognition systems.https://www.mdpi.com/2076-3417/12/4/2041content-adaptiveattention mechanismgesture recognitionhand detection
spellingShingle Zongjing Cao
Yan Li
Byeong-Seok Shin
Content-Adaptive and Attention-Based Network for Hand Gesture Recognition
Applied Sciences
content-adaptive
attention mechanism
gesture recognition
hand detection
title Content-Adaptive and Attention-Based Network for Hand Gesture Recognition
title_full Content-Adaptive and Attention-Based Network for Hand Gesture Recognition
title_fullStr Content-Adaptive and Attention-Based Network for Hand Gesture Recognition
title_full_unstemmed Content-Adaptive and Attention-Based Network for Hand Gesture Recognition
title_short Content-Adaptive and Attention-Based Network for Hand Gesture Recognition
title_sort content adaptive and attention based network for hand gesture recognition
topic content-adaptive
attention mechanism
gesture recognition
hand detection
url https://www.mdpi.com/2076-3417/12/4/2041
work_keys_str_mv AT zongjingcao contentadaptiveandattentionbasednetworkforhandgesturerecognition
AT yanli contentadaptiveandattentionbasednetworkforhandgesturerecognition
AT byeongseokshin contentadaptiveandattentionbasednetworkforhandgesturerecognition