Machine learning for lip reading

Lip-reading is one of the most challenging task in visual recognition system. It decodes the text from the movement of lips from the speaker. In the previous approach, the lip-reading problem is divided by two stages: feature extraction and prediction. The Hidden Markov Model is implemented to solve...

Full description

Bibliographic Details
Main Author: Zhao, Han
Other Authors: Andy Khong Wai Hoong
Format: Final Year Project (FYP)
Language:English
Published: 2018
Subjects:
Online Access:http://hdl.handle.net/10356/74671
Description
Summary:Lip-reading is one of the most challenging task in visual recognition system. It decodes the text from the movement of lips from the speaker. In the previous approach, the lip-reading problem is divided by two stages: feature extraction and prediction. The Hidden Markov Model is implemented to solve the sequence problem. However, the traditional approaches require a lot of effort on feature extraction. Also, the models are trained to perform single word classification instead of sentence-level. This project aims to build an end-to-end sentence level system of lip-reading, by using the neural network and deep learning method. The convolutional neural network(CNN), recurrent neural network (RNN) and connectionist temporal classification (CTC) method will be implemented on the neural network. The GRID dataset is used in this project. Several speech videos from the GRID dataset will be used as training data.