Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images
<p/> <p>This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a h...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2007-01-01
|
Series: | EURASIP Journal on Audio, Speech, and Music Processing |
Online Access: | http://asmp.eurasipjournals.com/content/2007/064506 |
_version_ | 1818344375636197376 |
---|---|
author | Iwano Koji Yoshinaga Tomoaki Tamura Satoshi Furui Sadaoki |
author_facet | Iwano Koji Yoshinaga Tomoaki Tamura Satoshi Furui Sadaoki |
author_sort | Iwano Koji |
collection | DOAJ |
description | <p/> <p>This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a handset. Two different kinds of lip features, lip-contour geometric features and lip-motion velocity features, are used individually or jointly, in combination with audio features. Phoneme HMMs modeling the audio and visual features are built based on the multistream HMM technique. Experiments conducted using Japanese connected digit speech contaminated with white noise in various SNR conditions show effectiveness of the proposed method. Recognition accuracy is improved by using the visual information in all SNR conditions. These visual features were confirmed to be effective even when the audio HMM was adapted to noise by the MLLR method.</p> |
first_indexed | 2024-12-13T16:45:29Z |
format | Article |
id | doaj.art-66d46aef8fa34c1e81f3ac49d4f18e40 |
institution | Directory Open Access Journal |
issn | 1687-4714 1687-4722 |
language | English |
last_indexed | 2024-12-13T16:45:29Z |
publishDate | 2007-01-01 |
publisher | SpringerOpen |
record_format | Article |
series | EURASIP Journal on Audio, Speech, and Music Processing |
spelling | doaj.art-66d46aef8fa34c1e81f3ac49d4f18e402022-12-21T23:38:10ZengSpringerOpenEURASIP Journal on Audio, Speech, and Music Processing1687-47141687-47222007-01-0120071064506Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face ImagesIwano KojiYoshinaga TomoakiTamura SatoshiFurui Sadaoki<p/> <p>This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a handset. Two different kinds of lip features, lip-contour geometric features and lip-motion velocity features, are used individually or jointly, in combination with audio features. Phoneme HMMs modeling the audio and visual features are built based on the multistream HMM technique. Experiments conducted using Japanese connected digit speech contaminated with white noise in various SNR conditions show effectiveness of the proposed method. Recognition accuracy is improved by using the visual information in all SNR conditions. These visual features were confirmed to be effective even when the audio HMM was adapted to noise by the MLLR method.</p>http://asmp.eurasipjournals.com/content/2007/064506 |
spellingShingle | Iwano Koji Yoshinaga Tomoaki Tamura Satoshi Furui Sadaoki Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images EURASIP Journal on Audio, Speech, and Music Processing |
title | Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images |
title_full | Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images |
title_fullStr | Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images |
title_full_unstemmed | Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images |
title_short | Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images |
title_sort | audio visual speech recognition using lip information extracted from side face images |
url | http://asmp.eurasipjournals.com/content/2007/064506 |
work_keys_str_mv | AT iwanokoji audiovisualspeechrecognitionusinglipinformationextractedfromsidefaceimages AT yoshinagatomoaki audiovisualspeechrecognitionusinglipinformationextractedfromsidefaceimages AT tamurasatoshi audiovisualspeechrecognitionusinglipinformationextractedfromsidefaceimages AT furuisadaoki audiovisualspeechrecognitionusinglipinformationextractedfromsidefaceimages |