Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images

<p/> <p>This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a h...

Full description

Bibliographic Details
Main Authors: Iwano Koji, Yoshinaga Tomoaki, Tamura Satoshi, Furui Sadaoki
Format: Article
Language:English
Published: SpringerOpen 2007-01-01
Series:EURASIP Journal on Audio, Speech, and Music Processing
Online Access:http://asmp.eurasipjournals.com/content/2007/064506
_version_ 1818344375636197376
author Iwano Koji
Yoshinaga Tomoaki
Tamura Satoshi
Furui Sadaoki
author_facet Iwano Koji
Yoshinaga Tomoaki
Tamura Satoshi
Furui Sadaoki
author_sort Iwano Koji
collection DOAJ
description <p/> <p>This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a handset. Two different kinds of lip features, lip-contour geometric features and lip-motion velocity features, are used individually or jointly, in combination with audio features. Phoneme HMMs modeling the audio and visual features are built based on the multistream HMM technique. Experiments conducted using Japanese connected digit speech contaminated with white noise in various SNR conditions show effectiveness of the proposed method. Recognition accuracy is improved by using the visual information in all SNR conditions. These visual features were confirmed to be effective even when the audio HMM was adapted to noise by the MLLR method.</p>
first_indexed 2024-12-13T16:45:29Z
format Article
id doaj.art-66d46aef8fa34c1e81f3ac49d4f18e40
institution Directory Open Access Journal
issn 1687-4714
1687-4722
language English
last_indexed 2024-12-13T16:45:29Z
publishDate 2007-01-01
publisher SpringerOpen
record_format Article
series EURASIP Journal on Audio, Speech, and Music Processing
spelling doaj.art-66d46aef8fa34c1e81f3ac49d4f18e402022-12-21T23:38:10ZengSpringerOpenEURASIP Journal on Audio, Speech, and Music Processing1687-47141687-47222007-01-0120071064506Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face ImagesIwano KojiYoshinaga TomoakiTamura SatoshiFurui Sadaoki<p/> <p>This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a handset. Two different kinds of lip features, lip-contour geometric features and lip-motion velocity features, are used individually or jointly, in combination with audio features. Phoneme HMMs modeling the audio and visual features are built based on the multistream HMM technique. Experiments conducted using Japanese connected digit speech contaminated with white noise in various SNR conditions show effectiveness of the proposed method. Recognition accuracy is improved by using the visual information in all SNR conditions. These visual features were confirmed to be effective even when the audio HMM was adapted to noise by the MLLR method.</p>http://asmp.eurasipjournals.com/content/2007/064506
spellingShingle Iwano Koji
Yoshinaga Tomoaki
Tamura Satoshi
Furui Sadaoki
Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images
EURASIP Journal on Audio, Speech, and Music Processing
title Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images
title_full Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images
title_fullStr Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images
title_full_unstemmed Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images
title_short Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images
title_sort audio visual speech recognition using lip information extracted from side face images
url http://asmp.eurasipjournals.com/content/2007/064506
work_keys_str_mv AT iwanokoji audiovisualspeechrecognitionusinglipinformationextractedfromsidefaceimages
AT yoshinagatomoaki audiovisualspeechrecognitionusinglipinformationextractedfromsidefaceimages
AT tamurasatoshi audiovisualspeechrecognitionusinglipinformationextractedfromsidefaceimages
AT furuisadaoki audiovisualspeechrecognitionusinglipinformationextractedfromsidefaceimages