Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli
<p/> <p>We present a new approach to the source separation problem in the case of multiple speech signals. The method is based on the use of automatic lipreading, the objective is to extract an acoustic speech signal from other acoustic signals by exploiting its coherence with the speake...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2002-01-01
|
Series: | EURASIP Journal on Advances in Signal Processing |
Subjects: | |
Online Access: | http://dx.doi.org/10.1155/S1110865702207015 |
_version_ | 1818049154140602368 |
---|---|
author | Sodoyer David Schwartz Jean-Luc Girin Laurent Klinkisch Jacob Jutten Christian |
author_facet | Sodoyer David Schwartz Jean-Luc Girin Laurent Klinkisch Jacob Jutten Christian |
author_sort | Sodoyer David |
collection | DOAJ |
description | <p/> <p>We present a new approach to the source separation problem in the case of multiple speech signals. The method is based on the use of automatic lipreading, the objective is to extract an acoustic speech signal from other acoustic signals by exploiting its coherence with the speaker′s lip movements. We consider the case of an additive stationary mixture of decorrelated sources, with no further assumptions on independence or non-Gaussian character. Firstly, we present a theoretical framework showing that it is indeed possible to separate a source when some of its spectral characteristics are provided to the system. Then we address the case of audio-visual sources. We show how, if a statistical model of the joint probability of visual and spectral audio input is learnt to quantify the audio-visual coherence, separation can be achieved by maximizing this probability. Finally, we present a number of separation results on a corpus of vowel-plosive-vowel sequences uttered by a single speaker, embedded in a mixture of other voices. We show that separation can be quite good for mixtures of 2, 3, and 5 sources. These results, while very preliminary, are encouraging, and are discussed in respect to their potential complementarity with traditional pure audio separation or enhancement techniques.</p> |
first_indexed | 2024-12-10T10:33:04Z |
format | Article |
id | doaj.art-fe7866ac46bc45c3bcb1057210cc962a |
institution | Directory Open Access Journal |
issn | 1687-6172 1687-6180 |
language | English |
last_indexed | 2024-12-10T10:33:04Z |
publishDate | 2002-01-01 |
publisher | SpringerOpen |
record_format | Article |
series | EURASIP Journal on Advances in Signal Processing |
spelling | doaj.art-fe7866ac46bc45c3bcb1057210cc962a2022-12-22T01:52:31ZengSpringerOpenEURASIP Journal on Advances in Signal Processing1687-61721687-61802002-01-01200211382823Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech StimuliSodoyer DavidSchwartz Jean-LucGirin LaurentKlinkisch JacobJutten Christian<p/> <p>We present a new approach to the source separation problem in the case of multiple speech signals. The method is based on the use of automatic lipreading, the objective is to extract an acoustic speech signal from other acoustic signals by exploiting its coherence with the speaker′s lip movements. We consider the case of an additive stationary mixture of decorrelated sources, with no further assumptions on independence or non-Gaussian character. Firstly, we present a theoretical framework showing that it is indeed possible to separate a source when some of its spectral characteristics are provided to the system. Then we address the case of audio-visual sources. We show how, if a statistical model of the joint probability of visual and spectral audio input is learnt to quantify the audio-visual coherence, separation can be achieved by maximizing this probability. Finally, we present a number of separation results on a corpus of vowel-plosive-vowel sequences uttered by a single speaker, embedded in a mixture of other voices. We show that separation can be quite good for mixtures of 2, 3, and 5 sources. These results, while very preliminary, are encouraging, and are discussed in respect to their potential complementarity with traditional pure audio separation or enhancement techniques.</p>http://dx.doi.org/10.1155/S1110865702207015blind source separationlipreadingaudio-visual speech processing |
spellingShingle | Sodoyer David Schwartz Jean-Luc Girin Laurent Klinkisch Jacob Jutten Christian Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli EURASIP Journal on Advances in Signal Processing blind source separation lipreading audio-visual speech processing |
title | Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli |
title_full | Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli |
title_fullStr | Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli |
title_full_unstemmed | Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli |
title_short | Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli |
title_sort | separation of audio visual speech sources a new approach exploiting the audio visual coherence of speech stimuli |
topic | blind source separation lipreading audio-visual speech processing |
url | http://dx.doi.org/10.1155/S1110865702207015 |
work_keys_str_mv | AT sodoyerdavid separationofaudiovisualspeechsourcesanewapproachexploitingtheaudiovisualcoherenceofspeechstimuli AT schwartzjeanluc separationofaudiovisualspeechsourcesanewapproachexploitingtheaudiovisualcoherenceofspeechstimuli AT girinlaurent separationofaudiovisualspeechsourcesanewapproachexploitingtheaudiovisualcoherenceofspeechstimuli AT klinkischjacob separationofaudiovisualspeechsourcesanewapproachexploitingtheaudiovisualcoherenceofspeechstimuli AT juttenchristian separationofaudiovisualspeechsourcesanewapproachexploitingtheaudiovisualcoherenceofspeechstimuli |