Audio–visual keyword transformer for unconstrained sentence‐level keyword spotting

Abstract As one of the most effective methods to improve the accuracy and robustness of speech tasks, the audio–visual fusion approach has recently been introduced into the field of Keyword Spotting (KWS). However, existing audio–visual keyword spotting models are limited to detecting isolated words...

Full description

Bibliographic Details
Main Authors: Yidi Li, Jiale Ren, Yawei Wang, Guoquan Wang, Xia Li, Hong Liu
Format: Article
Language:English
Published: Wiley 2024-02-01
Series:CAAI Transactions on Intelligence Technology
Subjects:
Online Access:https://doi.org/10.1049/cit2.12212