Diverse Pose Lip-Reading Framework

Lip-reading is a technique to understand speech by observing a speaker’s lips movement. It has numerous applications; for example, it is helpful for hearing impaired persons and understanding the speech in noisy environments. Most of the previous works of lips-reading focused on frontal and near fro...

Full description

Bibliographic Details
Main Authors:	Naheed Akhter, Mushtaq Ali, Lal Hussain, Mohsin Shah, Toqeer Mahmood, Amjad Ali, Ala Al-Fuqaha
Format:	Article
Language:	English
Published:	MDPI AG 2022-09-01
Series:	Applied Sciences
Subjects:	lip reading machine learning face frontalization generative adversarial network
Online Access:	https://www.mdpi.com/2076-3417/12/19/9532

Description
Summary:	Lip-reading is a technique to understand speech by observing a speaker’s lips movement. It has numerous applications; for example, it is helpful for hearing impaired persons and understanding the speech in noisy environments. Most of the previous works of lips-reading focused on frontal and near frontal face lip-reading and some of them targeted multiple poses in high quality videos. However, their results are not satisfactory on low quality videos containing multiple poses. In this research work, a lip-reading framework is proposed for improving the recognition rate in low quality videos. In this work, a Multiple Pose (MP) dataset of low quality videos containing multiple extreme poses is built. The proposed framework decomposes the input video into frames and enhances them by applying the Contrast Limited Adaptive Histogram Equalization (CLAHE) method. Next, faces are detected from enhanced frames and frontalized the multiple poses using the face frontalization Generative Adversarial Network (FF-GAN). After face frontalization, the mouth region is extracted. The extracted mouth region in the whole video and its respective sentences are then provided to the ResNet during the training process. The proposed framework achieved a sentence prediction accuracy of 90% on a testing dataset containing 100 silent low-quality videos with multiple poses that are better as compared to state-of-the-art methods.
ISSN:	2076-3417

Diverse Pose Lip-Reading Framework

Similar Items