Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images

<p/> <p>This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a h...

Full description

Bibliographic Details
Main Authors: Iwano Koji, Yoshinaga Tomoaki, Tamura Satoshi, Furui Sadaoki
Format: Article
Language:English
Published: SpringerOpen 2007-01-01
Series:EURASIP Journal on Audio, Speech, and Music Processing
Online Access:http://asmp.eurasipjournals.com/content/2007/064506
Description
Summary:<p/> <p>This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a handset. Two different kinds of lip features, lip-contour geometric features and lip-motion velocity features, are used individually or jointly, in combination with audio features. Phoneme HMMs modeling the audio and visual features are built based on the multistream HMM technique. Experiments conducted using Japanese connected digit speech contaminated with white noise in various SNR conditions show effectiveness of the proposed method. Recognition accuracy is improved by using the visual information in all SNR conditions. These visual features were confirmed to be effective even when the audio HMM was adapted to noise by the MLLR method.</p>
ISSN:1687-4714
1687-4722