Chinese Lip-Reading Research Based on ShuffleNet and CBAM
Lip reading has attracted increasing attention recently due to advances in deep learning. However, most research targets English datasets. The study of Chinese lip-reading technology is still in its initial stage. Firstly, in this paper, we expand the naturally distributed word-level Chinese dataset...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-01-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/13/2/1106 |
_version_ | 1797446530221735936 |
---|---|
author | Yixian Fu Yuanyao Lu Ran Ni |
author_facet | Yixian Fu Yuanyao Lu Ran Ni |
author_sort | Yixian Fu |
collection | DOAJ |
description | Lip reading has attracted increasing attention recently due to advances in deep learning. However, most research targets English datasets. The study of Chinese lip-reading technology is still in its initial stage. Firstly, in this paper, we expand the naturally distributed word-level Chinese dataset called ‘Databox’ previously built by our laboratory. Secondly, the current state-of-the-art model consists of a residual network and a temporal convolutional network. The residual network leads to excessive computational cost and is not suitable for the on-device applications. In the new model, the residual network is replaced with ShuffleNet, which is an extremely computation-efficient Convolutional Neural Network (CNN) architecture. Thirdly, to help the network focus on the most useful information, we insert a simple but effective attention module called Convolutional Block Attention Module (CBAM) into the ShuffleNet. In our experiment, we compare several model architectures and find that our model achieves a comparable accuracy to the residual network (3.5 GFLOPs) under the computational budget of 1.01 GFLOPs. |
first_indexed | 2024-03-09T13:41:52Z |
format | Article |
id | doaj.art-f561d3046a5d4a629d5eb8062efd32f4 |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-09T13:41:52Z |
publishDate | 2023-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-f561d3046a5d4a629d5eb8062efd32f42023-11-30T21:06:17ZengMDPI AGApplied Sciences2076-34172023-01-01132110610.3390/app13021106Chinese Lip-Reading Research Based on ShuffleNet and CBAMYixian Fu0Yuanyao Lu1Ran Ni2School of Information Science and Technology, North China University of Technology, Beijing 100144, ChinaSchool of Information Science and Technology, North China University of Technology, Beijing 100144, ChinaSchool of Information Science and Technology, North China University of Technology, Beijing 100144, ChinaLip reading has attracted increasing attention recently due to advances in deep learning. However, most research targets English datasets. The study of Chinese lip-reading technology is still in its initial stage. Firstly, in this paper, we expand the naturally distributed word-level Chinese dataset called ‘Databox’ previously built by our laboratory. Secondly, the current state-of-the-art model consists of a residual network and a temporal convolutional network. The residual network leads to excessive computational cost and is not suitable for the on-device applications. In the new model, the residual network is replaced with ShuffleNet, which is an extremely computation-efficient Convolutional Neural Network (CNN) architecture. Thirdly, to help the network focus on the most useful information, we insert a simple but effective attention module called Convolutional Block Attention Module (CBAM) into the ShuffleNet. In our experiment, we compare several model architectures and find that our model achieves a comparable accuracy to the residual network (3.5 GFLOPs) under the computational budget of 1.01 GFLOPs.https://www.mdpi.com/2076-3417/13/2/1106Chinese lip-readingShuffleNetCBAMlight-weight network |
spellingShingle | Yixian Fu Yuanyao Lu Ran Ni Chinese Lip-Reading Research Based on ShuffleNet and CBAM Applied Sciences Chinese lip-reading ShuffleNet CBAM light-weight network |
title | Chinese Lip-Reading Research Based on ShuffleNet and CBAM |
title_full | Chinese Lip-Reading Research Based on ShuffleNet and CBAM |
title_fullStr | Chinese Lip-Reading Research Based on ShuffleNet and CBAM |
title_full_unstemmed | Chinese Lip-Reading Research Based on ShuffleNet and CBAM |
title_short | Chinese Lip-Reading Research Based on ShuffleNet and CBAM |
title_sort | chinese lip reading research based on shufflenet and cbam |
topic | Chinese lip-reading ShuffleNet CBAM light-weight network |
url | https://www.mdpi.com/2076-3417/13/2/1106 |
work_keys_str_mv | AT yixianfu chineselipreadingresearchbasedonshufflenetandcbam AT yuanyaolu chineselipreadingresearchbasedonshufflenetandcbam AT ranni chineselipreadingresearchbasedonshufflenetandcbam |