Improving the Recognition Performance of Lip Reading Using the Concatenated Three Sequence Keyframe Image Technique

This paper proposes a lip reading method based on convolutional neural networks applied to Concatenated Three Sequence Keyframe Image (C3-SKI), consisting of (a) the Start-Lip Image (SLI), (b) the Middle-Lip Image (MLI), and (c) the End-Lip Image (ELI) which is the end of the pronunciation of that s...

Full description

Bibliographic Details
Main Authors: L. Poomhiran, P. Meesad, S. Nuanmeesri
Format: Article
Language:English
Published: D. G. Pylarinos 2021-04-01
Series:Engineering, Technology & Applied Science Research
Subjects:
Online Access:https://etasr.com/index.php/ETASR/article/view/4102
_version_ 1798028529893376000
author L. Poomhiran
P. Meesad
S. Nuanmeesri
author_facet L. Poomhiran
P. Meesad
S. Nuanmeesri
author_sort L. Poomhiran
collection DOAJ
description This paper proposes a lip reading method based on convolutional neural networks applied to Concatenated Three Sequence Keyframe Image (C3-SKI), consisting of (a) the Start-Lip Image (SLI), (b) the Middle-Lip Image (MLI), and (c) the End-Lip Image (ELI) which is the end of the pronunciation of that syllable. The lip area’s image dimensions were reduced to 32×32 pixels per image frame and three keyframes concatenate together were used to represent one syllable with a dimension of 96×32 pixels for visual speech recognition. Every three concatenated keyframes representing any syllable are selected based on the relative maximum and relative minimum related to the open lip’s width and height. The evaluation results of the model’s effectiveness, showed accuracy, validation accuracy, loss, and validation loss values at 95.06%, 86.03%, 4.61%, and 9.04% respectively, for the THDigits dataset. The C3-SKI technique was also applied to the AVDigits dataset, showing 85.62% accuracy. In conclusion, the C3-SKI technique could be applied to perform lip reading recognition.
first_indexed 2024-04-11T19:09:47Z
format Article
id doaj.art-690dd27059614fec82759a4e582690ee
institution Directory Open Access Journal
issn 2241-4487
1792-8036
language English
last_indexed 2024-04-11T19:09:47Z
publishDate 2021-04-01
publisher D. G. Pylarinos
record_format Article
series Engineering, Technology & Applied Science Research
spelling doaj.art-690dd27059614fec82759a4e582690ee2022-12-22T04:07:39ZengD. G. PylarinosEngineering, Technology & Applied Science Research2241-44871792-80362021-04-0111210.48084/etasr.4102Improving the Recognition Performance of Lip Reading Using the Concatenated Three Sequence Keyframe Image TechniqueL. Poomhiran0P. Meesad1S. Nuanmeesri2Faculty of Information Technology and Digital Innovation, King Mongkut’s University of Technology North Bangkok, ThailandFaculty of Information Technology and Digital Innovation, King Mongkut’s University of Technology North Bangkok, ThailandFaculty of Science and Technology, Suan Sunandha Rajabhat University, ThailandThis paper proposes a lip reading method based on convolutional neural networks applied to Concatenated Three Sequence Keyframe Image (C3-SKI), consisting of (a) the Start-Lip Image (SLI), (b) the Middle-Lip Image (MLI), and (c) the End-Lip Image (ELI) which is the end of the pronunciation of that syllable. The lip area’s image dimensions were reduced to 32×32 pixels per image frame and three keyframes concatenate together were used to represent one syllable with a dimension of 96×32 pixels for visual speech recognition. Every three concatenated keyframes representing any syllable are selected based on the relative maximum and relative minimum related to the open lip’s width and height. The evaluation results of the model’s effectiveness, showed accuracy, validation accuracy, loss, and validation loss values at 95.06%, 86.03%, 4.61%, and 9.04% respectively, for the THDigits dataset. The C3-SKI technique was also applied to the AVDigits dataset, showing 85.62% accuracy. In conclusion, the C3-SKI technique could be applied to perform lip reading recognition.https://etasr.com/index.php/ETASR/article/view/4102concatenated frame imagesconvolutional neural networkkeyframe reductionkeyframe sequencelip reading
spellingShingle L. Poomhiran
P. Meesad
S. Nuanmeesri
Improving the Recognition Performance of Lip Reading Using the Concatenated Three Sequence Keyframe Image Technique
Engineering, Technology & Applied Science Research
concatenated frame images
convolutional neural network
keyframe reduction
keyframe sequence
lip reading
title Improving the Recognition Performance of Lip Reading Using the Concatenated Three Sequence Keyframe Image Technique
title_full Improving the Recognition Performance of Lip Reading Using the Concatenated Three Sequence Keyframe Image Technique
title_fullStr Improving the Recognition Performance of Lip Reading Using the Concatenated Three Sequence Keyframe Image Technique
title_full_unstemmed Improving the Recognition Performance of Lip Reading Using the Concatenated Three Sequence Keyframe Image Technique
title_short Improving the Recognition Performance of Lip Reading Using the Concatenated Three Sequence Keyframe Image Technique
title_sort improving the recognition performance of lip reading using the concatenated three sequence keyframe image technique
topic concatenated frame images
convolutional neural network
keyframe reduction
keyframe sequence
lip reading
url https://etasr.com/index.php/ETASR/article/view/4102
work_keys_str_mv AT lpoomhiran improvingtherecognitionperformanceoflipreadingusingtheconcatenatedthreesequencekeyframeimagetechnique
AT pmeesad improvingtherecognitionperformanceoflipreadingusingtheconcatenatedthreesequencekeyframeimagetechnique
AT snuanmeesri improvingtherecognitionperformanceoflipreadingusingtheconcatenatedthreesequencekeyframeimagetechnique