Seeing wake words: Audio-visual keyword spotting
The goal of this work is to automatically determine whether and when a word of interest is spoken by a talking face, with or without the audio. We propose a zero-shot method suitable for ‘in the wild’ videos. Our key contributions are: (1) a novel convolutional architecture, KWS-Net, that uses a sim...
Main Authors: | , , , , |
---|---|
Format: | Conference item |
Language: | English |
Published: |
British Machine Vision Association
2020
|