Improving the Recognition Performance of Lip Reading Using the Concatenated Three Sequence Keyframe Image Technique
This paper proposes a lip reading method based on convolutional neural networks applied to Concatenated Three Sequence Keyframe Image (C3-SKI), consisting of (a) the Start-Lip Image (SLI), (b) the Middle-Lip Image (MLI), and (c) the End-Lip Image (ELI) which is the end of the pronunciation of that s...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
D. G. Pylarinos
2021-04-01
|
Series: | Engineering, Technology & Applied Science Research |
Subjects: | |
Online Access: | https://etasr.com/index.php/ETASR/article/view/4102 |
_version_ | 1798028529893376000 |
---|---|
author | L. Poomhiran P. Meesad S. Nuanmeesri |
author_facet | L. Poomhiran P. Meesad S. Nuanmeesri |
author_sort | L. Poomhiran |
collection | DOAJ |
description | This paper proposes a lip reading method based on convolutional neural networks applied to Concatenated Three Sequence Keyframe Image (C3-SKI), consisting of (a) the Start-Lip Image (SLI), (b) the Middle-Lip Image (MLI), and (c) the End-Lip Image (ELI) which is the end of the pronunciation of that syllable. The lip area’s image dimensions were reduced to 32×32 pixels per image frame and three keyframes concatenate together were used to represent one syllable with a dimension of 96×32 pixels for visual speech recognition. Every three concatenated keyframes representing any syllable are selected based on the relative maximum and relative minimum related to the open lip’s width and height. The evaluation results of the model’s effectiveness, showed accuracy, validation accuracy, loss, and validation loss values at 95.06%, 86.03%, 4.61%, and 9.04% respectively, for the THDigits dataset. The C3-SKI technique was also applied to the AVDigits dataset, showing 85.62% accuracy. In conclusion, the C3-SKI technique could be applied to perform lip reading recognition. |
first_indexed | 2024-04-11T19:09:47Z |
format | Article |
id | doaj.art-690dd27059614fec82759a4e582690ee |
institution | Directory Open Access Journal |
issn | 2241-4487 1792-8036 |
language | English |
last_indexed | 2024-04-11T19:09:47Z |
publishDate | 2021-04-01 |
publisher | D. G. Pylarinos |
record_format | Article |
series | Engineering, Technology & Applied Science Research |
spelling | doaj.art-690dd27059614fec82759a4e582690ee2022-12-22T04:07:39ZengD. G. PylarinosEngineering, Technology & Applied Science Research2241-44871792-80362021-04-0111210.48084/etasr.4102Improving the Recognition Performance of Lip Reading Using the Concatenated Three Sequence Keyframe Image TechniqueL. Poomhiran0P. Meesad1S. Nuanmeesri2Faculty of Information Technology and Digital Innovation, King Mongkut’s University of Technology North Bangkok, ThailandFaculty of Information Technology and Digital Innovation, King Mongkut’s University of Technology North Bangkok, ThailandFaculty of Science and Technology, Suan Sunandha Rajabhat University, ThailandThis paper proposes a lip reading method based on convolutional neural networks applied to Concatenated Three Sequence Keyframe Image (C3-SKI), consisting of (a) the Start-Lip Image (SLI), (b) the Middle-Lip Image (MLI), and (c) the End-Lip Image (ELI) which is the end of the pronunciation of that syllable. The lip area’s image dimensions were reduced to 32×32 pixels per image frame and three keyframes concatenate together were used to represent one syllable with a dimension of 96×32 pixels for visual speech recognition. Every three concatenated keyframes representing any syllable are selected based on the relative maximum and relative minimum related to the open lip’s width and height. The evaluation results of the model’s effectiveness, showed accuracy, validation accuracy, loss, and validation loss values at 95.06%, 86.03%, 4.61%, and 9.04% respectively, for the THDigits dataset. The C3-SKI technique was also applied to the AVDigits dataset, showing 85.62% accuracy. In conclusion, the C3-SKI technique could be applied to perform lip reading recognition.https://etasr.com/index.php/ETASR/article/view/4102concatenated frame imagesconvolutional neural networkkeyframe reductionkeyframe sequencelip reading |
spellingShingle | L. Poomhiran P. Meesad S. Nuanmeesri Improving the Recognition Performance of Lip Reading Using the Concatenated Three Sequence Keyframe Image Technique Engineering, Technology & Applied Science Research concatenated frame images convolutional neural network keyframe reduction keyframe sequence lip reading |
title | Improving the Recognition Performance of Lip Reading Using the Concatenated Three Sequence Keyframe Image Technique |
title_full | Improving the Recognition Performance of Lip Reading Using the Concatenated Three Sequence Keyframe Image Technique |
title_fullStr | Improving the Recognition Performance of Lip Reading Using the Concatenated Three Sequence Keyframe Image Technique |
title_full_unstemmed | Improving the Recognition Performance of Lip Reading Using the Concatenated Three Sequence Keyframe Image Technique |
title_short | Improving the Recognition Performance of Lip Reading Using the Concatenated Three Sequence Keyframe Image Technique |
title_sort | improving the recognition performance of lip reading using the concatenated three sequence keyframe image technique |
topic | concatenated frame images convolutional neural network keyframe reduction keyframe sequence lip reading |
url | https://etasr.com/index.php/ETASR/article/view/4102 |
work_keys_str_mv | AT lpoomhiran improvingtherecognitionperformanceoflipreadingusingtheconcatenatedthreesequencekeyframeimagetechnique AT pmeesad improvingtherecognitionperformanceoflipreadingusingtheconcatenatedthreesequencekeyframeimagetechnique AT snuanmeesri improvingtherecognitionperformanceoflipreadingusingtheconcatenatedthreesequencekeyframeimagetechnique |