A Light-Weight Autoregressive CNN-Based Frame Level Transducer Decoder for End-to-End ASR
A convolutional neural network (CNN) transducer decoder was proposed to reduce the decoding time of an end-to-end automatic speech recognition (ASR) system while maintaining accuracy. The CNN of 177 k parameters and a kernel size of 6 generates the probabilities of the current token at the token lev...
Main Authors: | Hyeon-Kyu Noh, Hong-June Park |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2024-02-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/14/3/1300 |
Similar Items
-
A review on speech recognition approaches and challenges for Portuguese: exploring the feasibility of fine-tuning large-scale end-to-end models
by: Yan Li, et al.
Published: (2025-01-01) -
FrameAugment: A Simple Data Augmentation Method for Encoder–Decoder Speech Recognition
by: Seong-Su Lim, et al.
Published: (2022-07-01) -
Fast offline transformer-based end-to-end automatic speech recognition for real-world applications
by: Yoo Rhee Oh, et al.
Published: (2022-06-01) -
Improving End-to-End Models for Children’s Speech Recognition
by: Tanvina Patel, et al.
Published: (2024-03-01) -
End-to-end audiovisual speech recognition based on attention fusion of SDBN and BLSTM
by: Yiming WANG, et al.
Published: (2019-12-01)