Attention-based latent features for jointly trained end-to-end automatic speech recognition with modified speech enhancement

Attention-based latent features for jointly trained end-to-end automatic speech recognition with modified speech enhancement

In this paper, we propose a joint training framework that efficiently combines time-domain speech enhancement (SE) with an end-to-end (E2E) automatic speech recognition (ASR) system utilizing attention-based latent features. Using the latent feature to train E2E ASR implies that various time-domain...

Full description

Bibliographic Details
Main Authors:	Da-Hee Yang, Joon-Hyuk Chang
Format:	Article
Language:	English
Published:	Elsevier 2023-03-01
Series:	Journal of King Saud University: Computer and Information Sciences
Subjects:	Time-domain speech enhancement End-to-end automatic speech recognition Attention-based latent feature Joint training framework
Online Access:	http://www.sciencedirect.com/science/article/pii/S1319157823000368

Similar Items

KsponSpeech: Korean Spontaneous Speech Corpus for Automatic Speech Recognition
by: Jeong-Uk Bang, et al.
Published: (2020-10-01)

Improving End-to-End Models for Children’s Speech Recognition
by: Tanvina Patel, et al.
Published: (2024-03-01)

A Bidirectional Context Embedding Transformer for Automatic Speech Recognition
by: Lyuchao Liao, et al.
Published: (2022-01-01)

A Dual-Channel End-to-End Speech Enhancement Method Using Complex Operations in the Time Domain
by: Jian Pang, et al.
Published: (2023-06-01)

Fast offline transformer-based end-to-end automatic speech recognition for real-world applications
by: Yoo Rhee Oh, et al.
Published: (2022-06-01)

End-to-End Mandarin Speech Recognition Combining CNN and BLSTM
by: Dong Wang, et al.
Published: (2019-05-01)

Accented Speech Recognition Based on End-to-End Domain Adversarial Training of Neural Networks
by: Hyeong-Ju Na, et al.
Published: (2021-09-01)

Effective Emotion Transplantation in an End-to-End Text-to-Speech System
by: Young-Sun Joo, et al.
Published: (2020-01-01)

AI-based language tutoring systems with end-to-end automatic speech recognition and proficiency evaluation
by: Byung Ok Kang, et al.
Published: (2024-02-01)

LAS-Transformer: An Enhanced Transformer Based on the Local Attention Mechanism for Speech Recognition
by: Pengbin Fu, et al.
Published: (2022-05-01)

Improving out of vocabulary words recognition accuracy for an end-to-end Russian speech recognition system
by: Andrei Yu. Andrusenko, et al.
Published: (2022-12-01)

MKD: Mixup-Based Knowledge Distillation for Mandarin End-to-End Speech Recognition
by: Xing Wu, et al.
Published: (2022-05-01)

End-to-End Noisy Speech Recognition Using Fourier and Hilbert Spectrum Features
by: Daria Vazhenina, et al.
Published: (2020-07-01)

A Comparison of Hybrid and End-to-End ASR Systems for the IberSpeech-RTVE 2020 Speech-to-Text Transcription Challenge
by: Juan M. Perero-Codosero, et al.
Published: (2022-01-01)

Variable Scale Pruning for Transformer Model Compression in End-to-End Speech Recognition
by: Leila Ben Letaifa, et al.
Published: (2023-08-01)

Dynamic Acoustic Unit Augmentation with BPE-Dropout for Low-Resource End-to-End Speech Recognition
by: Aleksandr Laptev, et al.
Published: (2021-04-01)

Cross-Language End-to-End Speech Recognition Research Based on Transfer Learning for the Low-Resource Tujia Language
by: Chongchong Yu, et al.
Published: (2019-02-01)

Augmented Latent Features of Deep Neural Network-Based Automatic Speech Recognition for Motor-Driven Robots
by: Moa Lee, et al.
Published: (2020-07-01)

Automatic Speech Recognition Method Based on Deep Learning Approaches for Uzbek Language
by: Abdinabi Mukhamadiyev, et al.
Published: (2022-05-01)

A Light-Weight Autoregressive CNN-Based Frame Level Transducer Decoder for End-to-End ASR
by: Hyeon-Kyu Noh, et al.
Published: (2024-02-01)

End-User Recommendations on LOGOMON - a Computer Based Speech Therapy System for Romanian Language
by: SCHIPOR, O. A., et al.
Published: (2010-11-01)

Techniques for noise robustness in automatic speech recognition /
by: Virtanen, Tuomas, et al.
Published: (c201)

JSUM: A Multitask Learning Speech Recognition Model for Jointly Supervised and Unsupervised Learning
by: Nurmemet Yolwas, et al.
Published: (2023-04-01)

A Low-Latency Streaming On-Device Automatic Speech Recognition System Using a CNN Acoustic Model on FPGA and a Language Model on Smartphone
by: Jaehyun Park, et al.
Published: (2022-06-01)

On the use of a taxonomy of time-frequency morphologies for automatic speech recognition /
by: 262647 De Mori, Renato, et al.

Multi‐stage attention network for monaural speech enhancement
by: Kunpeng Wang, et al.
Published: (2023-03-01)

The design and development of an educational software on automatic speech recognition /
by: 257901 Hong, Kai Sze
Published: (2004)

FrameAugment: A Simple Data Augmentation Method for Encoder–Decoder Speech Recognition
by: Seong-Su Lim, et al.
Published: (2022-07-01)

Sub-convolutional U-Net with transformer attention network for end-to-end single-channel speech enhancement
by: Sivaramakrishna Yecchuri, et al.
Published: (2024-02-01)

Speech recognition and speech synthesis design for students information services /
by: 441320 Ling, Kee Soon
Published: (2001)

End-to-End Automatic Pronunciation Error Detection Based on Improved Hybrid CTC/Attention Architecture
by: Long Zhang, et al.
Published: (2020-03-01)

Advancements in end-to-end isolated Kannada ASR system by combining robust noise elimination technique and TDNN
by: Yadava G. Thimmaraja, et al.
Published: (2023-11-01)

Speech synthesis and recognition/
by: 421175 Holmes, J. N.
Published: (1988)

Attention-based speech feature transfer between speakers
by: Hangbok Lee, et al.
Published: (2024-02-01)

Automatic speech and speaker recognition : large margin and kernel methods /
by: Keshet, Joseph, et al.
Published: (2009)

LWMD: A Comprehensive Compression Platform for End-to-End Automatic Speech Recognition Models
by: Yukun Liu, et al.
Published: (2023-01-01)

BanSpeech: A Multi-Domain Bangla Speech Recognition Benchmark Toward Robust Performance in Challenging Conditions
by: Ahnaf Mozib Samin, et al.
Published: (2024-01-01)

Characterizing Dysarthria Diversity for Automatic Speech Recognition: A Tutorial From the Clinical Perspective
by: Hannah P. Rowe, et al.
Published: (2022-04-01)

Streaming End-to-End Target-Speaker Automatic Speech Recognition and Activity Detection
by: Takafumi Moriya, et al.
Published: (2023-01-01)

Speech recognition and speech synthesis design for students information services [microfilm] /
by: Ling, Kee Soon
Published: (2001)