Seeing wake words: Audio-visual keyword spotting

The goal of this work is to automatically determine whether and when a word of interest is spoken by a talking face, with or without the audio. We propose a zero-shot method suitable for ‘in the wild’ videos. Our key contributions are: (1) a novel convolutional architecture, KWS-Net, that uses a sim...

Full description

Bibliographic Details
Main Authors:	Momeni, L, Afouras, T, Stafylakis, T, Albanie, S, Zisserman, A
Format:	Conference item
Language:	English
Published:	British Machine Vision Association 2020

Seeing wake words: Audio-visual keyword spotting

Similar Items