Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli

<p/> <p>We present a new approach to the source separation problem in the case of multiple speech signals. The method is based on the use of automatic lipreading, the objective is to extract an acoustic speech signal from other acoustic signals by exploiting its coherence with the speake...

Full description

Bibliographic Details
Main Authors: Sodoyer David, Schwartz Jean-Luc, Girin Laurent, Klinkisch Jacob, Jutten Christian
Format: Article
Language:English
Published: SpringerOpen 2002-01-01
Series:EURASIP Journal on Advances in Signal Processing
Subjects:
Online Access:http://dx.doi.org/10.1155/S1110865702207015
_version_ 1818049154140602368
author Sodoyer David
Schwartz Jean-Luc
Girin Laurent
Klinkisch Jacob
Jutten Christian
author_facet Sodoyer David
Schwartz Jean-Luc
Girin Laurent
Klinkisch Jacob
Jutten Christian
author_sort Sodoyer David
collection DOAJ
description <p/> <p>We present a new approach to the source separation problem in the case of multiple speech signals. The method is based on the use of automatic lipreading, the objective is to extract an acoustic speech signal from other acoustic signals by exploiting its coherence with the speaker&#8242;s lip movements. We consider the case of an additive stationary mixture of decorrelated sources, with no further assumptions on independence or non-Gaussian character. Firstly, we present a theoretical framework showing that it is indeed possible to separate a source when some of its spectral characteristics are provided to the system. Then we address the case of audio-visual sources. We show how, if a statistical model of the joint probability of visual and spectral audio input is learnt to quantify the audio-visual coherence, separation can be achieved by maximizing this probability. Finally, we present a number of separation results on a corpus of vowel-plosive-vowel sequences uttered by a single speaker, embedded in a mixture of other voices. We show that separation can be quite good for mixtures of 2, 3, and 5 sources. These results, while very preliminary, are encouraging, and are discussed in respect to their potential complementarity with traditional pure audio separation or enhancement techniques.</p>
first_indexed 2024-12-10T10:33:04Z
format Article
id doaj.art-fe7866ac46bc45c3bcb1057210cc962a
institution Directory Open Access Journal
issn 1687-6172
1687-6180
language English
last_indexed 2024-12-10T10:33:04Z
publishDate 2002-01-01
publisher SpringerOpen
record_format Article
series EURASIP Journal on Advances in Signal Processing
spelling doaj.art-fe7866ac46bc45c3bcb1057210cc962a2022-12-22T01:52:31ZengSpringerOpenEURASIP Journal on Advances in Signal Processing1687-61721687-61802002-01-01200211382823Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech StimuliSodoyer DavidSchwartz Jean-LucGirin LaurentKlinkisch JacobJutten Christian<p/> <p>We present a new approach to the source separation problem in the case of multiple speech signals. The method is based on the use of automatic lipreading, the objective is to extract an acoustic speech signal from other acoustic signals by exploiting its coherence with the speaker&#8242;s lip movements. We consider the case of an additive stationary mixture of decorrelated sources, with no further assumptions on independence or non-Gaussian character. Firstly, we present a theoretical framework showing that it is indeed possible to separate a source when some of its spectral characteristics are provided to the system. Then we address the case of audio-visual sources. We show how, if a statistical model of the joint probability of visual and spectral audio input is learnt to quantify the audio-visual coherence, separation can be achieved by maximizing this probability. Finally, we present a number of separation results on a corpus of vowel-plosive-vowel sequences uttered by a single speaker, embedded in a mixture of other voices. We show that separation can be quite good for mixtures of 2, 3, and 5 sources. These results, while very preliminary, are encouraging, and are discussed in respect to their potential complementarity with traditional pure audio separation or enhancement techniques.</p>http://dx.doi.org/10.1155/S1110865702207015blind source separationlipreadingaudio-visual speech processing
spellingShingle Sodoyer David
Schwartz Jean-Luc
Girin Laurent
Klinkisch Jacob
Jutten Christian
Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli
EURASIP Journal on Advances in Signal Processing
blind source separation
lipreading
audio-visual speech processing
title Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli
title_full Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli
title_fullStr Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli
title_full_unstemmed Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli
title_short Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli
title_sort separation of audio visual speech sources a new approach exploiting the audio visual coherence of speech stimuli
topic blind source separation
lipreading
audio-visual speech processing
url http://dx.doi.org/10.1155/S1110865702207015
work_keys_str_mv AT sodoyerdavid separationofaudiovisualspeechsourcesanewapproachexploitingtheaudiovisualcoherenceofspeechstimuli
AT schwartzjeanluc separationofaudiovisualspeechsourcesanewapproachexploitingtheaudiovisualcoherenceofspeechstimuli
AT girinlaurent separationofaudiovisualspeechsourcesanewapproachexploitingtheaudiovisualcoherenceofspeechstimuli
AT klinkischjacob separationofaudiovisualspeechsourcesanewapproachexploitingtheaudiovisualcoherenceofspeechstimuli
AT juttenchristian separationofaudiovisualspeechsourcesanewapproachexploitingtheaudiovisualcoherenceofspeechstimuli