An Attention-Enhanced Multi-Scale and Dual Sign Language Recognition Network Based on a Graph Convolution Network

Sign language is the most important way of communication for hearing-impaired people. Research on sign language recognition can help normal people understand sign language. We reviewed the classic methods of sign language recognition, and the recognition accuracy is not high enough because of redund...

Full description

Bibliographic Details
Main Authors:	Lu Meng, Ronghui Li
Format:	Article
Language:	English
Published:	MDPI AG 2021-02-01
Series:	Sensors
Subjects:	sign language recognition GCN attention mechanism keyframes extraction large-vocabulary
Online Access:	https://www.mdpi.com/1424-8220/21/4/1120

_version_	1797414112480722944
author	Lu Meng Ronghui Li
author_facet	Lu Meng Ronghui Li
author_sort	Lu Meng
collection	DOAJ
description	Sign language is the most important way of communication for hearing-impaired people. Research on sign language recognition can help normal people understand sign language. We reviewed the classic methods of sign language recognition, and the recognition accuracy is not high enough because of redundant information, human finger occlusion, motion blurring, the diversified signing styles of different people, and so on. To overcome these shortcomings, we propose a multi-scale and dual sign language recognition Network (SLR-Net) based on a graph convolutional network (GCN). The original input data was RGB videos. We first extracted the skeleton data from them and then used the skeleton data for sign language recognition. SLR-Net is mainly composed of three sub-modules: multi-scale attention network (MSA), multi-scale spatiotemporal attention network (MSSTA) and attention enhanced temporal convolution network (ATCN). MSA allows the GCN to learn the dependencies between long-distance vertices; MSSTA can directly learn the spatiotemporal features; ATCN allows the GCN network to better learn the long temporal dependencies. The three different attention mechanisms, multi-scale attention mechanism, spatiotemporal attention mechanism, and temporal attention mechanism, are proposed to further improve the robustness and accuracy. Besides, a keyframe extraction algorithm is proposed, which can greatly improve efficiency by sacrificing a little accuracy. Experimental results showed that our method can reach 98.08% accuracy rate in the CSL-500 dataset with a 500-word vocabulary. Even on the challenging dataset DEVISIGN-L with a 2000-word vocabulary, it also reached a 64.57% accuracy rate, outperforming other state-of-the-art sign language recognition methods.
first_indexed	2024-03-09T05:28:25Z
format	Article
id	doaj.art-7dfa4b8976194bfdba36695406a5ec0c
institution	Directory Open Access Journal
issn	1424-8220
language	English
last_indexed	2024-03-09T05:28:25Z
publishDate	2021-02-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj.art-7dfa4b8976194bfdba36695406a5ec0c2023-12-03T12:35:07ZengMDPI AGSensors1424-82202021-02-01214112010.3390/s21041120An Attention-Enhanced Multi-Scale and Dual Sign Language Recognition Network Based on a Graph Convolution NetworkLu Meng0Ronghui Li1College of Information Science and Engineering, Northeastern University, Shenyang 110000, ChinaCollege of Information Science and Engineering, Northeastern University, Shenyang 110000, ChinaSign language is the most important way of communication for hearing-impaired people. Research on sign language recognition can help normal people understand sign language. We reviewed the classic methods of sign language recognition, and the recognition accuracy is not high enough because of redundant information, human finger occlusion, motion blurring, the diversified signing styles of different people, and so on. To overcome these shortcomings, we propose a multi-scale and dual sign language recognition Network (SLR-Net) based on a graph convolutional network (GCN). The original input data was RGB videos. We first extracted the skeleton data from them and then used the skeleton data for sign language recognition. SLR-Net is mainly composed of three sub-modules: multi-scale attention network (MSA), multi-scale spatiotemporal attention network (MSSTA) and attention enhanced temporal convolution network (ATCN). MSA allows the GCN to learn the dependencies between long-distance vertices; MSSTA can directly learn the spatiotemporal features; ATCN allows the GCN network to better learn the long temporal dependencies. The three different attention mechanisms, multi-scale attention mechanism, spatiotemporal attention mechanism, and temporal attention mechanism, are proposed to further improve the robustness and accuracy. Besides, a keyframe extraction algorithm is proposed, which can greatly improve efficiency by sacrificing a little accuracy. Experimental results showed that our method can reach 98.08% accuracy rate in the CSL-500 dataset with a 500-word vocabulary. Even on the challenging dataset DEVISIGN-L with a 2000-word vocabulary, it also reached a 64.57% accuracy rate, outperforming other state-of-the-art sign language recognition methods.https://www.mdpi.com/1424-8220/21/4/1120sign language recognitionGCNattention mechanismkeyframes extractionlarge-vocabulary
spellingShingle	Lu Meng Ronghui Li An Attention-Enhanced Multi-Scale and Dual Sign Language Recognition Network Based on a Graph Convolution Network Sensors sign language recognition GCN attention mechanism keyframes extraction large-vocabulary
title	An Attention-Enhanced Multi-Scale and Dual Sign Language Recognition Network Based on a Graph Convolution Network
title_full	An Attention-Enhanced Multi-Scale and Dual Sign Language Recognition Network Based on a Graph Convolution Network
title_fullStr	An Attention-Enhanced Multi-Scale and Dual Sign Language Recognition Network Based on a Graph Convolution Network
title_full_unstemmed	An Attention-Enhanced Multi-Scale and Dual Sign Language Recognition Network Based on a Graph Convolution Network
title_short	An Attention-Enhanced Multi-Scale and Dual Sign Language Recognition Network Based on a Graph Convolution Network
title_sort	attention enhanced multi scale and dual sign language recognition network based on a graph convolution network
topic	sign language recognition GCN attention mechanism keyframes extraction large-vocabulary
url	https://www.mdpi.com/1424-8220/21/4/1120
work_keys_str_mv	AT lumeng anattentionenhancedmultiscaleanddualsignlanguagerecognitionnetworkbasedonagraphconvolutionnetwork AT ronghuili anattentionenhancedmultiscaleanddualsignlanguagerecognitionnetworkbasedonagraphconvolutionnetwork AT lumeng attentionenhancedmultiscaleanddualsignlanguagerecognitionnetworkbasedonagraphconvolutionnetwork AT ronghuili attentionenhancedmultiscaleanddualsignlanguagerecognitionnetworkbasedonagraphconvolutionnetwork

An Attention-Enhanced Multi-Scale and Dual Sign Language Recognition Network Based on a Graph Convolution Network

Similar Items