Audio–visual keyword transformer for unconstrained sentence‐level keyword spotting

Abstract As one of the most effective methods to improve the accuracy and robustness of speech tasks, the audio–visual fusion approach has recently been introduced into the field of Keyword Spotting (KWS). However, existing audio–visual keyword spotting models are limited to detecting isolated words...

Full description

Bibliographic Details
Main Authors:	Yidi Li, Jiale Ren, Yawei Wang, Guoquan Wang, Xia Li, Hong Liu
Format:	Article
Language:	English
Published:	Wiley 2024-02-01
Series:	CAAI Transactions on Intelligence Technology
Subjects:	artificial intelligence multimodal approaches natural language processing neural network speech processing
Online Access:	https://doi.org/10.1049/cit2.12212

Internet

https://doi.org/10.1049/cit2.12212

Audio–visual keyword transformer for unconstrained sentence‐level keyword spotting

Internet

Similar Items