Audio–visual keyword transformer for unconstrained sentence‐level keyword spotting
Abstract As one of the most effective methods to improve the accuracy and robustness of speech tasks, the audio–visual fusion approach has recently been introduced into the field of Keyword Spotting (KWS). However, existing audio–visual keyword spotting models are limited to detecting isolated words...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2024-02-01
|
Series: | CAAI Transactions on Intelligence Technology |
Subjects: | |
Online Access: | https://doi.org/10.1049/cit2.12212 |