A Light-Weight Autoregressive CNN-Based Frame Level Transducer Decoder for End-to-End ASR

A convolutional neural network (CNN) transducer decoder was proposed to reduce the decoding time of an end-to-end automatic speech recognition (ASR) system while maintaining accuracy. The CNN of 177 k parameters and a kernel size of 6 generates the probabilities of the current token at the token lev...

Full description

Bibliographic Details
Main Authors: Hyeon-Kyu Noh, Hong-June Park
Format: Article
Language:English
Published: MDPI AG 2024-02-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/14/3/1300