Seeing wake words: Audio-visual keyword spotting

The goal of this work is to automatically determine whether and when a word of interest is spoken by a talking face, with or without the audio. We propose a zero-shot method suitable for ‘in the wild’ videos. Our key contributions are: (1) a novel convolutional architecture, KWS-Net, that uses a sim...

Full description

Bibliographic Details
Main Authors: Momeni, L, Afouras, T, Stafylakis, T, Albanie, S, Zisserman, A
Format: Conference item
Language:English
Published: British Machine Vision Association 2020