Multi-Stream General and Graph-Based Deep Neural Networks for Skeleton-Based Sign Language Recognition

Sign language recognition (SLR) aims to bridge speech-impaired and general communities by recognizing signs from given videos. However, due to the complex background, light illumination, and subject structures in videos, researchers still face challenges in developing effective SLR systems. Many res...

Full description

Bibliographic Details
Main Authors:	Abu Saleh Musa Miah, Md. Al Mehedi Hasan, Si-Woong Jang, Hyoun-Sup Lee, Jungpil Shin
Format:	Article
Language:	English
Published:	MDPI AG 2023-06-01
Series:	Electronics
Subjects:	sign language recognition (SLR) large scale dataset American Sign Language Turkish Sign Language Chinese Sign Language AUTSL
Online Access:	https://www.mdpi.com/2079-9292/12/13/2841

_version_	1797591816014397440
author	Abu Saleh Musa Miah Md. Al Mehedi Hasan Si-Woong Jang Hyoun-Sup Lee Jungpil Shin
author_facet	Abu Saleh Musa Miah Md. Al Mehedi Hasan Si-Woong Jang Hyoun-Sup Lee Jungpil Shin
author_sort	Abu Saleh Musa Miah
collection	DOAJ
description	Sign language recognition (SLR) aims to bridge speech-impaired and general communities by recognizing signs from given videos. However, due to the complex background, light illumination, and subject structures in videos, researchers still face challenges in developing effective SLR systems. Many researchers have recently sought to develop skeleton-based sign language recognition systems to overcome the subject and background variation in hand gesture sign videos. However, skeleton-based SLR is still under exploration, mainly due to a lack of information and hand key point annotations. More recently, researchers have included body and face information along with hand gesture information for SLR; however, the obtained performance accuracy and generalizability properties remain unsatisfactory. In this paper, we propose a multi-stream graph-based deep neural network (SL-GDN) for a skeleton-based SLR system in order to overcome the above-mentioned problems. The main purpose of the proposed SL-GDN approach is to improve the generalizability and performance accuracy of the SLR system while maintaining a low computational cost based on the human body pose in the form of 2D landmark locations. We first construct a skeleton graph based on 27 whole-body key points selected among 67 key points to address the high computational cost problem. Then, we utilize the multi-stream SL-GDN to extract features from the whole-body skeleton graph considering four streams. Finally, we concatenate the four different features and apply a classification module to refine the features and recognize corresponding sign classes. Our data-driven graph construction method increases the system’s flexibility and brings high generalizability, allowing it to adapt to varied data. We use two large-scale benchmark SLR data sets to evaluate the proposed model: The Turkish Sign Language data set (AUTSL) and Chinese Sign Language (CSL). The reported performance accuracy results demonstrate the outstanding ability of the proposed model, and we believe that it will be considered a great innovation in the SLR domain.
first_indexed	2024-03-11T01:43:44Z
format	Article
id	doaj.art-21dff0d49fde42d8bbb8deb75f430019
institution	Directory Open Access Journal
issn	2079-9292
language	English
last_indexed	2024-03-11T01:43:44Z
publishDate	2023-06-01
publisher	MDPI AG
record_format	Article
series	Electronics
spelling	doaj.art-21dff0d49fde42d8bbb8deb75f4300192023-11-18T16:24:12ZengMDPI AGElectronics2079-92922023-06-011213284110.3390/electronics12132841Multi-Stream General and Graph-Based Deep Neural Networks for Skeleton-Based Sign Language RecognitionAbu Saleh Musa Miah0Md. Al Mehedi Hasan1Si-Woong Jang2Hyoun-Sup Lee3Jungpil Shin4School of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu 965-8580, JapanDepartment of Computer Science and Engineering, Rajshahi University of Engineering and Technology (RUET), Rajshahi 6204, BangladeshDepartment of Computer Engineering, Dongeui University, Busanjin-Gu, Busan 47340, Republic of KoreaDepartment of Applied Software Engineering, Dongeui University, Busanjin-Gu, Busan 47340, Republic of KoreaSchool of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu 965-8580, JapanSign language recognition (SLR) aims to bridge speech-impaired and general communities by recognizing signs from given videos. However, due to the complex background, light illumination, and subject structures in videos, researchers still face challenges in developing effective SLR systems. Many researchers have recently sought to develop skeleton-based sign language recognition systems to overcome the subject and background variation in hand gesture sign videos. However, skeleton-based SLR is still under exploration, mainly due to a lack of information and hand key point annotations. More recently, researchers have included body and face information along with hand gesture information for SLR; however, the obtained performance accuracy and generalizability properties remain unsatisfactory. In this paper, we propose a multi-stream graph-based deep neural network (SL-GDN) for a skeleton-based SLR system in order to overcome the above-mentioned problems. The main purpose of the proposed SL-GDN approach is to improve the generalizability and performance accuracy of the SLR system while maintaining a low computational cost based on the human body pose in the form of 2D landmark locations. We first construct a skeleton graph based on 27 whole-body key points selected among 67 key points to address the high computational cost problem. Then, we utilize the multi-stream SL-GDN to extract features from the whole-body skeleton graph considering four streams. Finally, we concatenate the four different features and apply a classification module to refine the features and recognize corresponding sign classes. Our data-driven graph construction method increases the system’s flexibility and brings high generalizability, allowing it to adapt to varied data. We use two large-scale benchmark SLR data sets to evaluate the proposed model: The Turkish Sign Language data set (AUTSL) and Chinese Sign Language (CSL). The reported performance accuracy results demonstrate the outstanding ability of the proposed model, and we believe that it will be considered a great innovation in the SLR domain.https://www.mdpi.com/2079-9292/12/13/2841sign language recognition (SLR)large scale datasetAmerican Sign LanguageTurkish Sign LanguageChinese Sign LanguageAUTSL
spellingShingle	Abu Saleh Musa Miah Md. Al Mehedi Hasan Si-Woong Jang Hyoun-Sup Lee Jungpil Shin Multi-Stream General and Graph-Based Deep Neural Networks for Skeleton-Based Sign Language Recognition Electronics sign language recognition (SLR) large scale dataset American Sign Language Turkish Sign Language Chinese Sign Language AUTSL
title	Multi-Stream General and Graph-Based Deep Neural Networks for Skeleton-Based Sign Language Recognition
title_full	Multi-Stream General and Graph-Based Deep Neural Networks for Skeleton-Based Sign Language Recognition
title_fullStr	Multi-Stream General and Graph-Based Deep Neural Networks for Skeleton-Based Sign Language Recognition
title_full_unstemmed	Multi-Stream General and Graph-Based Deep Neural Networks for Skeleton-Based Sign Language Recognition
title_short	Multi-Stream General and Graph-Based Deep Neural Networks for Skeleton-Based Sign Language Recognition
title_sort	multi stream general and graph based deep neural networks for skeleton based sign language recognition
topic	sign language recognition (SLR) large scale dataset American Sign Language Turkish Sign Language Chinese Sign Language AUTSL
url	https://www.mdpi.com/2079-9292/12/13/2841
work_keys_str_mv	AT abusalehmusamiah multistreamgeneralandgraphbaseddeepneuralnetworksforskeletonbasedsignlanguagerecognition AT mdalmehedihasan multistreamgeneralandgraphbaseddeepneuralnetworksforskeletonbasedsignlanguagerecognition AT siwoongjang multistreamgeneralandgraphbaseddeepneuralnetworksforskeletonbasedsignlanguagerecognition AT hyounsuplee multistreamgeneralandgraphbaseddeepneuralnetworksforskeletonbasedsignlanguagerecognition AT jungpilshin multistreamgeneralandgraphbaseddeepneuralnetworksforskeletonbasedsignlanguagerecognition

Multi-Stream General and Graph-Based Deep Neural Networks for Skeleton-Based Sign Language Recognition

Similar Items