Continuous sign language recognition based on hierarchical memory sequence network

Abstract With the goal of solving the problem of feature extractors lacking strong supervision training and insufficient time information concerning single‐sequence model learning, a hierarchical sequence memory network with a multi‐level iterative optimisation strategy is proposed for continuous si...

Full description

Bibliographic Details
Main Authors:	Cuihong Xue, Jingli Jia, Ming Yu, Gang Yan, Yingchun Guo, Yuehao Liu
Format:	Article
Language:	English
Published:	Wiley 2024-03-01
Series:	IET Computer Vision
Subjects:	computer vision gesture recognition image processing
Online Access:	https://doi.org/10.1049/cvi2.12240

_version_	1797260113814224896
author	Cuihong Xue Jingli Jia Ming Yu Gang Yan Yingchun Guo Yuehao Liu
author_facet	Cuihong Xue Jingli Jia Ming Yu Gang Yan Yingchun Guo Yuehao Liu
author_sort	Cuihong Xue
collection	DOAJ
description	Abstract With the goal of solving the problem of feature extractors lacking strong supervision training and insufficient time information concerning single‐sequence model learning, a hierarchical sequence memory network with a multi‐level iterative optimisation strategy is proposed for continuous sign language recognition. This method uses the spatial‐temporal fusion convolution network (STFC‐Net) to extract the spatial‐temporal information of RGB and Optical flow video frames to obtain the multi‐modal visual features of a sign language video. Then, in order to enhance the temporal relationships of visual feature maps, the hierarchical memory sequence network is used to capture local utterance features and global context dependencies across time dimensions to obtain sequence features. Finally, the decoder decodes the final sentence sequence. In order to enhance the feature extractor, the authors adopted a multi‐level iterative optimisation strategy to fine‐tune STFC‐Net and the utterance feature extractor. The experimental results on the RWTH‐Phoenix‐Weather multi‐signer 2014 dataset and the Chinese sign language dataset show the effectiveness and superiority of this method.
first_indexed	2024-04-24T23:20:10Z
format	Article
id	doaj.art-0da7f4df0f3d4595a6a2845221b3c769
institution	Directory Open Access Journal
issn	1751-9632 1751-9640
language	English
last_indexed	2024-04-24T23:20:10Z
publishDate	2024-03-01
publisher	Wiley
record_format	Article
series	IET Computer Vision
spelling	doaj.art-0da7f4df0f3d4595a6a2845221b3c7692024-03-16T07:56:05ZengWileyIET Computer Vision1751-96321751-96402024-03-0118224725910.1049/cvi2.12240Continuous sign language recognition based on hierarchical memory sequence networkCuihong Xue0Jingli Jia1Ming Yu2Gang Yan3Yingchun Guo4Yuehao Liu5Technical College for the Deaf Tianjin University of Technology Tianjin ChinaSchool of Artificial Intelligence Hebei University of Technology Tianjin ChinaSchool of Artificial Intelligence Hebei University of Technology Tianjin ChinaSchool of Artificial Intelligence Hebei University of Technology Tianjin ChinaSchool of Artificial Intelligence Hebei University of Technology Tianjin ChinaSchool of Artificial Intelligence Hebei University of Technology Tianjin ChinaAbstract With the goal of solving the problem of feature extractors lacking strong supervision training and insufficient time information concerning single‐sequence model learning, a hierarchical sequence memory network with a multi‐level iterative optimisation strategy is proposed for continuous sign language recognition. This method uses the spatial‐temporal fusion convolution network (STFC‐Net) to extract the spatial‐temporal information of RGB and Optical flow video frames to obtain the multi‐modal visual features of a sign language video. Then, in order to enhance the temporal relationships of visual feature maps, the hierarchical memory sequence network is used to capture local utterance features and global context dependencies across time dimensions to obtain sequence features. Finally, the decoder decodes the final sentence sequence. In order to enhance the feature extractor, the authors adopted a multi‐level iterative optimisation strategy to fine‐tune STFC‐Net and the utterance feature extractor. The experimental results on the RWTH‐Phoenix‐Weather multi‐signer 2014 dataset and the Chinese sign language dataset show the effectiveness and superiority of this method.https://doi.org/10.1049/cvi2.12240computer visiongesture recognitionimage processing
spellingShingle	Cuihong Xue Jingli Jia Ming Yu Gang Yan Yingchun Guo Yuehao Liu Continuous sign language recognition based on hierarchical memory sequence network IET Computer Vision computer vision gesture recognition image processing
title	Continuous sign language recognition based on hierarchical memory sequence network
title_full	Continuous sign language recognition based on hierarchical memory sequence network
title_fullStr	Continuous sign language recognition based on hierarchical memory sequence network
title_full_unstemmed	Continuous sign language recognition based on hierarchical memory sequence network
title_short	Continuous sign language recognition based on hierarchical memory sequence network
title_sort	continuous sign language recognition based on hierarchical memory sequence network
topic	computer vision gesture recognition image processing
url	https://doi.org/10.1049/cvi2.12240
work_keys_str_mv	AT cuihongxue continuoussignlanguagerecognitionbasedonhierarchicalmemorysequencenetwork AT jinglijia continuoussignlanguagerecognitionbasedonhierarchicalmemorysequencenetwork AT mingyu continuoussignlanguagerecognitionbasedonhierarchicalmemorysequencenetwork AT gangyan continuoussignlanguagerecognitionbasedonhierarchicalmemorysequencenetwork AT yingchunguo continuoussignlanguagerecognitionbasedonhierarchicalmemorysequencenetwork AT yuehaoliu continuoussignlanguagerecognitionbasedonhierarchicalmemorysequencenetwork

Continuous sign language recognition based on hierarchical memory sequence network

Similar Items