Chinese Lip-Reading Research Based on ShuffleNet and CBAM

Lip reading has attracted increasing attention recently due to advances in deep learning. However, most research targets English datasets. The study of Chinese lip-reading technology is still in its initial stage. Firstly, in this paper, we expand the naturally distributed word-level Chinese dataset...

Full description

Bibliographic Details
Main Authors: Yixian Fu, Yuanyao Lu, Ran Ni
Format: Article
Language:English
Published: MDPI AG 2023-01-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/13/2/1106
_version_ 1797446530221735936
author Yixian Fu
Yuanyao Lu
Ran Ni
author_facet Yixian Fu
Yuanyao Lu
Ran Ni
author_sort Yixian Fu
collection DOAJ
description Lip reading has attracted increasing attention recently due to advances in deep learning. However, most research targets English datasets. The study of Chinese lip-reading technology is still in its initial stage. Firstly, in this paper, we expand the naturally distributed word-level Chinese dataset called ‘Databox’ previously built by our laboratory. Secondly, the current state-of-the-art model consists of a residual network and a temporal convolutional network. The residual network leads to excessive computational cost and is not suitable for the on-device applications. In the new model, the residual network is replaced with ShuffleNet, which is an extremely computation-efficient Convolutional Neural Network (CNN) architecture. Thirdly, to help the network focus on the most useful information, we insert a simple but effective attention module called Convolutional Block Attention Module (CBAM) into the ShuffleNet. In our experiment, we compare several model architectures and find that our model achieves a comparable accuracy to the residual network (3.5 GFLOPs) under the computational budget of 1.01 GFLOPs.
first_indexed 2024-03-09T13:41:52Z
format Article
id doaj.art-f561d3046a5d4a629d5eb8062efd32f4
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-09T13:41:52Z
publishDate 2023-01-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-f561d3046a5d4a629d5eb8062efd32f42023-11-30T21:06:17ZengMDPI AGApplied Sciences2076-34172023-01-01132110610.3390/app13021106Chinese Lip-Reading Research Based on ShuffleNet and CBAMYixian Fu0Yuanyao Lu1Ran Ni2School of Information Science and Technology, North China University of Technology, Beijing 100144, ChinaSchool of Information Science and Technology, North China University of Technology, Beijing 100144, ChinaSchool of Information Science and Technology, North China University of Technology, Beijing 100144, ChinaLip reading has attracted increasing attention recently due to advances in deep learning. However, most research targets English datasets. The study of Chinese lip-reading technology is still in its initial stage. Firstly, in this paper, we expand the naturally distributed word-level Chinese dataset called ‘Databox’ previously built by our laboratory. Secondly, the current state-of-the-art model consists of a residual network and a temporal convolutional network. The residual network leads to excessive computational cost and is not suitable for the on-device applications. In the new model, the residual network is replaced with ShuffleNet, which is an extremely computation-efficient Convolutional Neural Network (CNN) architecture. Thirdly, to help the network focus on the most useful information, we insert a simple but effective attention module called Convolutional Block Attention Module (CBAM) into the ShuffleNet. In our experiment, we compare several model architectures and find that our model achieves a comparable accuracy to the residual network (3.5 GFLOPs) under the computational budget of 1.01 GFLOPs.https://www.mdpi.com/2076-3417/13/2/1106Chinese lip-readingShuffleNetCBAMlight-weight network
spellingShingle Yixian Fu
Yuanyao Lu
Ran Ni
Chinese Lip-Reading Research Based on ShuffleNet and CBAM
Applied Sciences
Chinese lip-reading
ShuffleNet
CBAM
light-weight network
title Chinese Lip-Reading Research Based on ShuffleNet and CBAM
title_full Chinese Lip-Reading Research Based on ShuffleNet and CBAM
title_fullStr Chinese Lip-Reading Research Based on ShuffleNet and CBAM
title_full_unstemmed Chinese Lip-Reading Research Based on ShuffleNet and CBAM
title_short Chinese Lip-Reading Research Based on ShuffleNet and CBAM
title_sort chinese lip reading research based on shufflenet and cbam
topic Chinese lip-reading
ShuffleNet
CBAM
light-weight network
url https://www.mdpi.com/2076-3417/13/2/1106
work_keys_str_mv AT yixianfu chineselipreadingresearchbasedonshufflenetandcbam
AT yuanyaolu chineselipreadingresearchbasedonshufflenetandcbam
AT ranni chineselipreadingresearchbasedonshufflenetandcbam