A Light-Weight Autoregressive CNN-Based Frame Level Transducer Decoder for End-to-End ASR

A Light-Weight Autoregressive CNN-Based Frame Level Transducer Decoder for End-to-End ASR

A convolutional neural network (CNN) transducer decoder was proposed to reduce the decoding time of an end-to-end automatic speech recognition (ASR) system while maintaining accuracy. The CNN of 177 k parameters and a kernel size of 6 generates the probabilities of the current token at the token lev...

Full description

Bibliographic Details
Main Authors:	Hyeon-Kyu Noh, Hong-June Park
Format:	Article
Language:	English
Published:	MDPI AG 2024-02-01
Series:	Applied Sciences
Subjects:	speech recognition autoregressive speech recognition end-to-end CNN transducer decoder
Online Access:	https://www.mdpi.com/2076-3417/14/3/1300

Similar Items

A review on speech recognition approaches and challenges for Portuguese: exploring the feasibility of fine-tuning large-scale end-to-end models
by: Yan Li, et al.
Published: (2025-01-01)

FrameAugment: A Simple Data Augmentation Method for Encoder–Decoder Speech Recognition
by: Seong-Su Lim, et al.
Published: (2022-07-01)

Fast offline transformer-based end-to-end automatic speech recognition for real-world applications
by: Yoo Rhee Oh, et al.
Published: (2022-06-01)

Improving End-to-End Models for Children’s Speech Recognition
by: Tanvina Patel, et al.
Published: (2024-03-01)

End-to-end audiovisual speech recognition based on attention fusion of SDBN and BLSTM
by: Yiming WANG, et al.
Published: (2019-12-01)

End-to-end audiovisual speech recognition based on attention fusion of SDBN and BLSTM
by: Yiming WANG, et al.
Published: (2019-12-01)

End-To-End deep neural models for Automatic Speech Recognition for Polish Language
by: Karolina Pondel-Sycz, et al.
Published: (2024-06-01)

Advancements in end-to-end isolated Kannada ASR system by combining robust noise elimination technique and TDNN
by: Yadava G. Thimmaraja, et al.
Published: (2023-11-01)

Ubranch Conformer: Integrating Up-Down Sampling and Branch Attention for Speech Recognition
by: Yang Yang, et al.
Published: (2024-01-01)

KsponSpeech: Korean Spontaneous Speech Corpus for Automatic Speech Recognition
by: Jeong-Uk Bang, et al.
Published: (2020-10-01)

An End-to-End Trainable Multi-Column CNN for Scene Recognition in Extremely Changing Environment
by: Zhenyu Li, et al.
Published: (2020-03-01)

MKD: Mixup-Based Knowledge Distillation for Mandarin End-to-End Speech Recognition
by: Xing Wu, et al.
Published: (2022-05-01)

Enhanced Conformer-Based Speech Recognition via Model Fusion and Adaptive Decoding with Dynamic Rescoring
by: Junhao Geng, et al.
Published: (2024-12-01)

A Bidirectional Context Embedding Transformer for Automatic Speech Recognition
by: Lyuchao Liao, et al.
Published: (2022-01-01)

AI-based language tutoring systems with end-to-end automatic speech recognition and proficiency evaluation
by: Byung Ok Kang, et al.
Published: (2024-02-01)

Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training
by: Hawraz A. Ahmad, et al.
Published: (2024-07-01)

Contextual Biasing for End-to-End Chinese ASR
by: Kai Zhang, et al.
Published: (2024-01-01)

End-to-end scene text detection and recognition algorithm based on Transformer decoders
by: Jinzhi ZHENG, et al.
Published: (2023-05-01)

End-to-end scene text detection and recognition algorithm based on Transformer decoders
by: Jinzhi ZHENG, et al.
Published: (2023-05-01)

Dynamic Acoustic Unit Augmentation with BPE-Dropout for Low-Resource End-to-End Speech Recognition
by: Aleksandr Laptev, et al.
Published: (2021-04-01)

Attention-based latent features for jointly trained end-to-end automatic speech recognition with modified speech enhancement
by: Da-Hee Yang, et al.
Published: (2023-03-01)

End-to-end speech enhancement based on ultra-lightweight channel attention
by: Yi HONG, et al.
Published: (2021-09-01)

MPSA-Conformer-CTC/Attention: A High-Accuracy, Low-Complexity End-to-End Approach for Tibetan Speech Recognition
by: Changlin Wu, et al.
Published: (2024-10-01)

Variable Scale Pruning for Transformer Model Compression in End-to-End Speech Recognition
by: Leila Ben Letaifa, et al.
Published: (2023-08-01)

Effective Emotion Transplantation in an End-to-End Text-to-Speech System
by: Young-Sun Joo, et al.
Published: (2020-01-01)

Non-Autoregressive End-to-End Neural Modeling for Automatic Pronunciation Error Detection
by: Md. Anwar Hussen Wadud, et al.
Published: (2022-12-01)

LAS-Transformer: An Enhanced Transformer Based on the Local Attention Mechanism for Speech Recognition
by: Pengbin Fu, et al.
Published: (2022-05-01)

An End-To-End Speech Recognition Model for the North Shaanxi Dialect: Design and Evaluation
by: Yi Qin, et al.
Published: (2025-01-01)

Amharic OCR: An End-to-End Learning
by: Birhanu Belay, et al.
Published: (2020-02-01)

BERTIVITS: The Posterior Encoder Fusion of Pre-Trained Models and Residual Skip Connections for End-to-End Speech Synthesis
by: Zirui Wang, et al.
Published: (2024-06-01)

A Dual-Channel End-to-End Speech Enhancement Method Using Complex Operations in the Time Domain
by: Jian Pang, et al.
Published: (2023-06-01)

End-to-End Mandarin Speech Reconstruction Based on Ultrasound Tongue Images Using Deep Learning
by: Fengji Li, et al.
Published: (2025-01-01)

A Comparison of Hybrid and End-to-End ASR Systems for the IberSpeech-RTVE 2020 Speech-to-Text Transcription Challenge
by: Juan M. Perero-Codosero, et al.
Published: (2022-01-01)

Design of educational software for automatic speech recognition (ASR) techniques [UTM article journal]/
by: 257901 Hong, Kai Sze, et al.

Streaming ASR Encoder for Whisper-to-Speech Online Voice Conversion
by: Anastasia Avdeeva, et al.
Published: (2024-01-01)

Method for visual analysis of driver's face for automatic lip-reading in the wild
by: A.A. Axyonov, et al.
Published: (2022-12-01)

An End-to-End Classifier Based on CNN for In-Air Handwritten-Chinese-Character Recognition
by: Mianjun Hu, et al.
Published: (2022-07-01)

Decoding the temporal dynamics of spoken word and nonword processing from EEG
by: Bob McMurray, et al.
Published: (2022-10-01)

Streaming End-to-End Target-Speaker Automatic Speech Recognition and Activity Detection
by: Takafumi Moriya, et al.
Published: (2023-01-01)

End-to-End Historical Handwritten Ethiopic Text Recognition Using Deep Learning
by: Ruchika Malhotra, et al.
Published: (2023-01-01)