Improving the Recognition Performance of Lip Reading Using the Concatenated Three Sequence Keyframe Image Technique

This paper proposes a lip reading method based on convolutional neural networks applied to Concatenated Three Sequence Keyframe Image (C3-SKI), consisting of (a) the Start-Lip Image (SLI), (b) the Middle-Lip Image (MLI), and (c) the End-Lip Image (ELI) which is the end of the pronunciation of that s...

Full description

Bibliographic Details
Main Authors:	L. Poomhiran, P. Meesad, S. Nuanmeesri
Format:	Article
Language:	English
Published:	D. G. Pylarinos 2021-04-01
Series:	Engineering, Technology & Applied Science Research
Subjects:	concatenated frame images convolutional neural network keyframe reduction keyframe sequence lip reading
Online Access:	https://etasr.com/index.php/ETASR/article/view/4102

_version_	1798028529893376000
author	L. Poomhiran P. Meesad S. Nuanmeesri
author_facet	L. Poomhiran P. Meesad S. Nuanmeesri
author_sort	L. Poomhiran
collection	DOAJ
description	This paper proposes a lip reading method based on convolutional neural networks applied to Concatenated Three Sequence Keyframe Image (C3-SKI), consisting of (a) the Start-Lip Image (SLI), (b) the Middle-Lip Image (MLI), and (c) the End-Lip Image (ELI) which is the end of the pronunciation of that syllable. The lip area’s image dimensions were reduced to 32×32 pixels per image frame and three keyframes concatenate together were used to represent one syllable with a dimension of 96×32 pixels for visual speech recognition. Every three concatenated keyframes representing any syllable are selected based on the relative maximum and relative minimum related to the open lip’s width and height. The evaluation results of the model’s effectiveness, showed accuracy, validation accuracy, loss, and validation loss values at 95.06%, 86.03%, 4.61%, and 9.04% respectively, for the THDigits dataset. The C3-SKI technique was also applied to the AVDigits dataset, showing 85.62% accuracy. In conclusion, the C3-SKI technique could be applied to perform lip reading recognition.
first_indexed	2024-04-11T19:09:47Z
format	Article
id	doaj.art-690dd27059614fec82759a4e582690ee
institution	Directory Open Access Journal
issn	2241-4487 1792-8036
language	English
last_indexed	2024-04-11T19:09:47Z
publishDate	2021-04-01
publisher	D. G. Pylarinos
record_format	Article
series	Engineering, Technology & Applied Science Research
spelling	doaj.art-690dd27059614fec82759a4e582690ee2022-12-22T04:07:39ZengD. G. PylarinosEngineering, Technology & Applied Science Research2241-44871792-80362021-04-0111210.48084/etasr.4102Improving the Recognition Performance of Lip Reading Using the Concatenated Three Sequence Keyframe Image TechniqueL. Poomhiran0P. Meesad1S. Nuanmeesri2Faculty of Information Technology and Digital Innovation, King Mongkut’s University of Technology North Bangkok, ThailandFaculty of Information Technology and Digital Innovation, King Mongkut’s University of Technology North Bangkok, ThailandFaculty of Science and Technology, Suan Sunandha Rajabhat University, ThailandThis paper proposes a lip reading method based on convolutional neural networks applied to Concatenated Three Sequence Keyframe Image (C3-SKI), consisting of (a) the Start-Lip Image (SLI), (b) the Middle-Lip Image (MLI), and (c) the End-Lip Image (ELI) which is the end of the pronunciation of that syllable. The lip area’s image dimensions were reduced to 32×32 pixels per image frame and three keyframes concatenate together were used to represent one syllable with a dimension of 96×32 pixels for visual speech recognition. Every three concatenated keyframes representing any syllable are selected based on the relative maximum and relative minimum related to the open lip’s width and height. The evaluation results of the model’s effectiveness, showed accuracy, validation accuracy, loss, and validation loss values at 95.06%, 86.03%, 4.61%, and 9.04% respectively, for the THDigits dataset. The C3-SKI technique was also applied to the AVDigits dataset, showing 85.62% accuracy. In conclusion, the C3-SKI technique could be applied to perform lip reading recognition.https://etasr.com/index.php/ETASR/article/view/4102concatenated frame imagesconvolutional neural networkkeyframe reductionkeyframe sequencelip reading
spellingShingle	L. Poomhiran P. Meesad S. Nuanmeesri Improving the Recognition Performance of Lip Reading Using the Concatenated Three Sequence Keyframe Image Technique Engineering, Technology & Applied Science Research concatenated frame images convolutional neural network keyframe reduction keyframe sequence lip reading
title	Improving the Recognition Performance of Lip Reading Using the Concatenated Three Sequence Keyframe Image Technique
title_full	Improving the Recognition Performance of Lip Reading Using the Concatenated Three Sequence Keyframe Image Technique
title_fullStr	Improving the Recognition Performance of Lip Reading Using the Concatenated Three Sequence Keyframe Image Technique
title_full_unstemmed	Improving the Recognition Performance of Lip Reading Using the Concatenated Three Sequence Keyframe Image Technique
title_short	Improving the Recognition Performance of Lip Reading Using the Concatenated Three Sequence Keyframe Image Technique
title_sort	improving the recognition performance of lip reading using the concatenated three sequence keyframe image technique
topic	concatenated frame images convolutional neural network keyframe reduction keyframe sequence lip reading
url	https://etasr.com/index.php/ETASR/article/view/4102
work_keys_str_mv	AT lpoomhiran improvingtherecognitionperformanceoflipreadingusingtheconcatenatedthreesequencekeyframeimagetechnique AT pmeesad improvingtherecognitionperformanceoflipreadingusingtheconcatenatedthreesequencekeyframeimagetechnique AT snuanmeesri improvingtherecognitionperformanceoflipreadingusingtheconcatenatedthreesequencekeyframeimagetechnique

Improving the Recognition Performance of Lip Reading Using the Concatenated Three Sequence Keyframe Image Technique

Similar Items