Decoding Silent Speech Based on High-Density Surface Electromyogram Using Spatiotemporal Neural Network

Finer-grained decoding at a phoneme or syllable level is a key technology for continuous recognition of silent speech based on surface electromyogram (sEMG). This paper aims at developing a novel syllable-level decoding method for continuous silent speech recognition (SSR) using spatio-temporal end-...

Full description

Bibliographic Details
Main Authors: Xi Chen, Xu Zhang, Xiang Chen, Xun Chen
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Transactions on Neural Systems and Rehabilitation Engineering
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10098814/
_version_ 1797805166284505088
author Xi Chen
Xu Zhang
Xiang Chen
Xun Chen
author_facet Xi Chen
Xu Zhang
Xiang Chen
Xun Chen
author_sort Xi Chen
collection DOAJ
description Finer-grained decoding at a phoneme or syllable level is a key technology for continuous recognition of silent speech based on surface electromyogram (sEMG). This paper aims at developing a novel syllable-level decoding method for continuous silent speech recognition (SSR) using spatio-temporal end-to-end neural network. In the proposed method, the high-density sEMG (HD-sEMG) was first converted into a series of feature images, and then a spatio-temporal end-to-end neural network was applied to extract discriminative feature representations and to achieve syllable-level decoding. The effectiveness of the proposed method was verified with HD-sEMG data recorded by four pieces of 64-channel electrode arrays placed over facial and laryngeal muscles of fifteen subjects subvocalizing 33 Chinese phrases consisting of 82 syllables. The proposed method outperformed the benchmark methods by achieving the highest phrase classification accuracy (97.17 &#x00B1; 1.53&#x0025;, <inline-formula> <tex-math notation="LaTeX">${p} &lt; 0.05$ </tex-math></inline-formula>), and lower character error rate (3.11 &#x00B1; 1.46&#x0025;, <inline-formula> <tex-math notation="LaTeX">${p} &lt; 0.05$ </tex-math></inline-formula>). This study provides a promising way of decoding sEMG towards SSR, which has great potential applications in instant communication and remote control.
first_indexed 2024-03-13T05:47:55Z
format Article
id doaj.art-8c5342e327894227a2c94f9d6f40f17c
institution Directory Open Access Journal
issn 1558-0210
language English
last_indexed 2024-03-13T05:47:55Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Transactions on Neural Systems and Rehabilitation Engineering
spelling doaj.art-8c5342e327894227a2c94f9d6f40f17c2023-06-13T20:07:37ZengIEEEIEEE Transactions on Neural Systems and Rehabilitation Engineering1558-02102023-01-01312069207810.1109/TNSRE.2023.326629910098814Decoding Silent Speech Based on High-Density Surface Electromyogram Using Spatiotemporal Neural NetworkXi Chen0https://orcid.org/0000-0001-7889-7151Xu Zhang1https://orcid.org/0000-0002-1533-4340Xiang Chen2https://orcid.org/0000-0001-8259-4815Xun Chen3https://orcid.org/0000-0002-4922-8116School of Information Science and Technology, University of Science and Technology of China, Anhui, Hefei, ChinaSchool of Information Science and Technology, University of Science and Technology of China, Anhui, Hefei, ChinaSchool of Information Science and Technology, University of Science and Technology of China, Anhui, Hefei, ChinaSchool of Information Science and Technology, University of Science and Technology of China, Anhui, Hefei, ChinaFiner-grained decoding at a phoneme or syllable level is a key technology for continuous recognition of silent speech based on surface electromyogram (sEMG). This paper aims at developing a novel syllable-level decoding method for continuous silent speech recognition (SSR) using spatio-temporal end-to-end neural network. In the proposed method, the high-density sEMG (HD-sEMG) was first converted into a series of feature images, and then a spatio-temporal end-to-end neural network was applied to extract discriminative feature representations and to achieve syllable-level decoding. The effectiveness of the proposed method was verified with HD-sEMG data recorded by four pieces of 64-channel electrode arrays placed over facial and laryngeal muscles of fifteen subjects subvocalizing 33 Chinese phrases consisting of 82 syllables. The proposed method outperformed the benchmark methods by achieving the highest phrase classification accuracy (97.17 &#x00B1; 1.53&#x0025;, <inline-formula> <tex-math notation="LaTeX">${p} &lt; 0.05$ </tex-math></inline-formula>), and lower character error rate (3.11 &#x00B1; 1.46&#x0025;, <inline-formula> <tex-math notation="LaTeX">${p} &lt; 0.05$ </tex-math></inline-formula>). This study provides a promising way of decoding sEMG towards SSR, which has great potential applications in instant communication and remote control.https://ieeexplore.ieee.org/document/10098814/Silent speech recognitionhigh-density surface electromyographyspatiotemporal featurelanguage modeltime sequence decoding
spellingShingle Xi Chen
Xu Zhang
Xiang Chen
Xun Chen
Decoding Silent Speech Based on High-Density Surface Electromyogram Using Spatiotemporal Neural Network
IEEE Transactions on Neural Systems and Rehabilitation Engineering
Silent speech recognition
high-density surface electromyography
spatiotemporal feature
language model
time sequence decoding
title Decoding Silent Speech Based on High-Density Surface Electromyogram Using Spatiotemporal Neural Network
title_full Decoding Silent Speech Based on High-Density Surface Electromyogram Using Spatiotemporal Neural Network
title_fullStr Decoding Silent Speech Based on High-Density Surface Electromyogram Using Spatiotemporal Neural Network
title_full_unstemmed Decoding Silent Speech Based on High-Density Surface Electromyogram Using Spatiotemporal Neural Network
title_short Decoding Silent Speech Based on High-Density Surface Electromyogram Using Spatiotemporal Neural Network
title_sort decoding silent speech based on high density surface electromyogram using spatiotemporal neural network
topic Silent speech recognition
high-density surface electromyography
spatiotemporal feature
language model
time sequence decoding
url https://ieeexplore.ieee.org/document/10098814/
work_keys_str_mv AT xichen decodingsilentspeechbasedonhighdensitysurfaceelectromyogramusingspatiotemporalneuralnetwork
AT xuzhang decodingsilentspeechbasedonhighdensitysurfaceelectromyogramusingspatiotemporalneuralnetwork
AT xiangchen decodingsilentspeechbasedonhighdensitysurfaceelectromyogramusingspatiotemporalneuralnetwork
AT xunchen decodingsilentspeechbasedonhighdensitysurfaceelectromyogramusingspatiotemporalneuralnetwork