Visual Speech Recognition for Kannada Language Using VGG16 Convolutional Neural Network

Visual speech recognition (VSR) is a method of reading speech by noticing the lip actions of the narrators. Visual speech significantly depends on the visual features derived from the image sequences. Visual speech recognition is a stimulating process that poses various challenging tasks to human ma...

Full description

Bibliographic Details
Main Authors: Shashidhar Rudregowda, Sudarshan Patil Kulkarni, Gururaj H L, Vinayakumar Ravi, Moez Krichen
Format: Article
Language:English
Published: MDPI AG 2023-03-01
Series:Acoustics
Subjects:
Online Access:https://www.mdpi.com/2624-599X/5/1/20
Description
Summary:Visual speech recognition (VSR) is a method of reading speech by noticing the lip actions of the narrators. Visual speech significantly depends on the visual features derived from the image sequences. Visual speech recognition is a stimulating process that poses various challenging tasks to human machine-based procedures. VSR methods clarify the tasks by using machine learning. Visual speech helps people who are hearing impaired, laryngeal patients, and are in a noisy environment. In this research, authors developed our dataset for the Kannada Language. The dataset contained five words, which are Avanu, Bagge, Bari, Guruthu, Helida, and these words are randomly chosen. The average duration of each video is 1 s to 1.2 s. The machine learning method is used for feature extraction and classification. Here, authors applied VGG16 Convolution Neural Network for our custom dataset, and relu activation function is used to get an accuracy of 91.90% and the recommended system confirms the effectiveness of the system. The proposed output is compared with HCNN, ResNet-LSTM, Bi-LSTM, and GLCM-ANN, and evidenced the effectiveness of the recommended system.
ISSN:2624-599X