Chinese Lip-Reading Research Based on ShuffleNet and CBAM

Lip reading has attracted increasing attention recently due to advances in deep learning. However, most research targets English datasets. The study of Chinese lip-reading technology is still in its initial stage. Firstly, in this paper, we expand the naturally distributed word-level Chinese dataset...

Full description

Bibliographic Details
Main Authors: Yixian Fu, Yuanyao Lu, Ran Ni
Format: Article
Language:English
Published: MDPI AG 2023-01-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/13/2/1106
Description
Summary:Lip reading has attracted increasing attention recently due to advances in deep learning. However, most research targets English datasets. The study of Chinese lip-reading technology is still in its initial stage. Firstly, in this paper, we expand the naturally distributed word-level Chinese dataset called ‘Databox’ previously built by our laboratory. Secondly, the current state-of-the-art model consists of a residual network and a temporal convolutional network. The residual network leads to excessive computational cost and is not suitable for the on-device applications. In the new model, the residual network is replaced with ShuffleNet, which is an extremely computation-efficient Convolutional Neural Network (CNN) architecture. Thirdly, to help the network focus on the most useful information, we insert a simple but effective attention module called Convolutional Block Attention Module (CBAM) into the ShuffleNet. In our experiment, we compare several model architectures and find that our model achieves a comparable accuracy to the residual network (3.5 GFLOPs) under the computational budget of 1.01 GFLOPs.
ISSN:2076-3417