An Efficient Two-Stream Network for Isolated Sign Language Recognition Using Accumulative Video Motion
Sign language is the primary communication medium for persons with hearing impairments. This language depends mainly on hand articulations accompanied by nonmanual gestures. Recently, there has been a growing interest in sign language recognition. In this paper, we propose a trainable deep learning...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2022-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9875269/ |
_version_ | 1798033327350874112 |
---|---|
author | Hamzah Luqman |
author_facet | Hamzah Luqman |
author_sort | Hamzah Luqman |
collection | DOAJ |
description | Sign language is the primary communication medium for persons with hearing impairments. This language depends mainly on hand articulations accompanied by nonmanual gestures. Recently, there has been a growing interest in sign language recognition. In this paper, we propose a trainable deep learning network for isolated sign language recognition, which can effectively capture the spatiotemporal information using a small number of signs’ frames. We propose a hierarchical sign learning module that comprises three networks: dynamic motion network (DMN), accumulative motion network (AMN), and sign recognition network (SRN). Additionally, we propose a technique to extract key postures for handling the variations in the sign samples performed by different signers. The DMN stream uses these key postures to learn the spatiotemporal information pertaining to the signs. We also propose a novel technique to represent the statical and dynamic information of sign gestures into a single frame. This approach preserves the spatial and temporal information of the sign by fusing the sign’s key postures in the forward and backward directions to generate an accumulative video motion frame. This frame was used as an input to the AMN stream, and the extracted features were fused with the DMN features to be fed into the SRN for the learning and classification of signs. The proposed approach is efficient for isolated sign language recognition, especially for recognizing static signs. We evaluated this approach on the KArSL-190 and KArSL-502 Arabic sign language datasets, and the obtained results on KArSL-190 outperformed other techniques by 15% in the signer-independent mode. Additionally, the proposed approach outperformed the state-of-the-art techniques on the Argentinian sign language dataset LSA64. The code is available at <uri>https://github.com/Hamzah-Luqman/SLR_AMN</uri>. |
first_indexed | 2024-04-11T20:28:36Z |
format | Article |
id | doaj.art-c70bdc46689645628c82ea4bc660a660 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-04-11T20:28:36Z |
publishDate | 2022-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-c70bdc46689645628c82ea4bc660a6602022-12-22T04:04:36ZengIEEEIEEE Access2169-35362022-01-0110937859379810.1109/ACCESS.2022.32041109875269An Efficient Two-Stream Network for Isolated Sign Language Recognition Using Accumulative Video MotionHamzah Luqman0https://orcid.org/0000-0001-7944-5093Information and Computer Science Department, College of Computing and Mathematics, King Fahd University of Petroleum & Minerals, Dhahran, Saudi ArabiaSign language is the primary communication medium for persons with hearing impairments. This language depends mainly on hand articulations accompanied by nonmanual gestures. Recently, there has been a growing interest in sign language recognition. In this paper, we propose a trainable deep learning network for isolated sign language recognition, which can effectively capture the spatiotemporal information using a small number of signs’ frames. We propose a hierarchical sign learning module that comprises three networks: dynamic motion network (DMN), accumulative motion network (AMN), and sign recognition network (SRN). Additionally, we propose a technique to extract key postures for handling the variations in the sign samples performed by different signers. The DMN stream uses these key postures to learn the spatiotemporal information pertaining to the signs. We also propose a novel technique to represent the statical and dynamic information of sign gestures into a single frame. This approach preserves the spatial and temporal information of the sign by fusing the sign’s key postures in the forward and backward directions to generate an accumulative video motion frame. This frame was used as an input to the AMN stream, and the extracted features were fused with the DMN features to be fed into the SRN for the learning and classification of signs. The proposed approach is efficient for isolated sign language recognition, especially for recognizing static signs. We evaluated this approach on the KArSL-190 and KArSL-502 Arabic sign language datasets, and the obtained results on KArSL-190 outperformed other techniques by 15% in the signer-independent mode. Additionally, the proposed approach outperformed the state-of-the-art techniques on the Argentinian sign language dataset LSA64. The code is available at <uri>https://github.com/Hamzah-Luqman/SLR_AMN</uri>.https://ieeexplore.ieee.org/document/9875269/Sign language recognitionArabic sign languageArgentinian sign languageKArSLLSA64gesture recognition |
spellingShingle | Hamzah Luqman An Efficient Two-Stream Network for Isolated Sign Language Recognition Using Accumulative Video Motion IEEE Access Sign language recognition Arabic sign language Argentinian sign language KArSL LSA64 gesture recognition |
title | An Efficient Two-Stream Network for Isolated Sign Language Recognition Using Accumulative Video Motion |
title_full | An Efficient Two-Stream Network for Isolated Sign Language Recognition Using Accumulative Video Motion |
title_fullStr | An Efficient Two-Stream Network for Isolated Sign Language Recognition Using Accumulative Video Motion |
title_full_unstemmed | An Efficient Two-Stream Network for Isolated Sign Language Recognition Using Accumulative Video Motion |
title_short | An Efficient Two-Stream Network for Isolated Sign Language Recognition Using Accumulative Video Motion |
title_sort | efficient two stream network for isolated sign language recognition using accumulative video motion |
topic | Sign language recognition Arabic sign language Argentinian sign language KArSL LSA64 gesture recognition |
url | https://ieeexplore.ieee.org/document/9875269/ |
work_keys_str_mv | AT hamzahluqman anefficienttwostreamnetworkforisolatedsignlanguagerecognitionusingaccumulativevideomotion AT hamzahluqman efficienttwostreamnetworkforisolatedsignlanguagerecognitionusingaccumulativevideomotion |