Content-Adaptive and Attention-Based Network for Hand Gesture Recognition
For hand gesture recognition, recurrent neural networks and 3D convolutional neural networks are the most commonly used methods for learning the spatial–temporal features of gestures. The calculation of the hidden state of the recurrent neural network at a specific time is determined by both input a...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-02-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/12/4/2041 |
_version_ | 1797483007726059520 |
---|---|
author | Zongjing Cao Yan Li Byeong-Seok Shin |
author_facet | Zongjing Cao Yan Li Byeong-Seok Shin |
author_sort | Zongjing Cao |
collection | DOAJ |
description | For hand gesture recognition, recurrent neural networks and 3D convolutional neural networks are the most commonly used methods for learning the spatial–temporal features of gestures. The calculation of the hidden state of the recurrent neural network at a specific time is determined by both input at the current time and the output of the hidden state at the previous time, therefore limiting its parallel computation. The large number of weight parameters that need to be optimized leads to high computational costs associated with 3D convolution-based methods. We introduced a transformer-based network for hand gesture recognition, which is a completely self-attentional architecture without any convolution or recurrent layers. The framework classifies hand gestures by focusing on the sequence information of the whole gesture video. In addition, we introduced an adaptive sampling strategy based on the video content to reduce the input of gesture-free frames to the model, thus reducing computational consumption. The proposed network achieved 83.2% and 93.8% recognition accuracy on two publicly available benchmark datasets, NVGesture and EgoGesture datasets, respectively. The results of extensive comparison experiments show that our proposed approach outperforms the existing state-of-the-art gesture recognition systems. |
first_indexed | 2024-03-09T22:41:44Z |
format | Article |
id | doaj.art-8ab3d6b3cd1c44609a092773864f4354 |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-09T22:41:44Z |
publishDate | 2022-02-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-8ab3d6b3cd1c44609a092773864f43542023-11-23T18:38:17ZengMDPI AGApplied Sciences2076-34172022-02-01124204110.3390/app12042041Content-Adaptive and Attention-Based Network for Hand Gesture RecognitionZongjing Cao0Yan Li1Byeong-Seok Shin2Department of Electrical and Computer Engineering, Inha University, Incheon 22212, KoreaDepartment of Electrical and Computer Engineering, Inha University, Incheon 22212, KoreaDepartment of Electrical and Computer Engineering, Inha University, Incheon 22212, KoreaFor hand gesture recognition, recurrent neural networks and 3D convolutional neural networks are the most commonly used methods for learning the spatial–temporal features of gestures. The calculation of the hidden state of the recurrent neural network at a specific time is determined by both input at the current time and the output of the hidden state at the previous time, therefore limiting its parallel computation. The large number of weight parameters that need to be optimized leads to high computational costs associated with 3D convolution-based methods. We introduced a transformer-based network for hand gesture recognition, which is a completely self-attentional architecture without any convolution or recurrent layers. The framework classifies hand gestures by focusing on the sequence information of the whole gesture video. In addition, we introduced an adaptive sampling strategy based on the video content to reduce the input of gesture-free frames to the model, thus reducing computational consumption. The proposed network achieved 83.2% and 93.8% recognition accuracy on two publicly available benchmark datasets, NVGesture and EgoGesture datasets, respectively. The results of extensive comparison experiments show that our proposed approach outperforms the existing state-of-the-art gesture recognition systems.https://www.mdpi.com/2076-3417/12/4/2041content-adaptiveattention mechanismgesture recognitionhand detection |
spellingShingle | Zongjing Cao Yan Li Byeong-Seok Shin Content-Adaptive and Attention-Based Network for Hand Gesture Recognition Applied Sciences content-adaptive attention mechanism gesture recognition hand detection |
title | Content-Adaptive and Attention-Based Network for Hand Gesture Recognition |
title_full | Content-Adaptive and Attention-Based Network for Hand Gesture Recognition |
title_fullStr | Content-Adaptive and Attention-Based Network for Hand Gesture Recognition |
title_full_unstemmed | Content-Adaptive and Attention-Based Network for Hand Gesture Recognition |
title_short | Content-Adaptive and Attention-Based Network for Hand Gesture Recognition |
title_sort | content adaptive and attention based network for hand gesture recognition |
topic | content-adaptive attention mechanism gesture recognition hand detection |
url | https://www.mdpi.com/2076-3417/12/4/2041 |
work_keys_str_mv | AT zongjingcao contentadaptiveandattentionbasednetworkforhandgesturerecognition AT yanli contentadaptiveandattentionbasednetworkforhandgesturerecognition AT byeongseokshin contentadaptiveandattentionbasednetworkforhandgesturerecognition |