Target Speaker Extraction Using Attention-Enhanced Temporal Convolutional Network

When recording conversations, there may be multiple people talking at once. While our human ears can filter out unwanted sounds, this can be challenging for automatic speech recognition (ASR) systems, leading to reduced accuracy. To address this issue, preprocessing mechanisms such as speech separat...

Full description

Bibliographic Details
Main Authors: Jian-Hong Wang, Yen-Ting Lai, Tzu-Chiang Tai, Phuong Thi Le, Tuan Pham, Ze-Yu Wang, Yung-Hui Li, Jia-Ching Wang, Pao-Chi Chang
Format: Article
Language:English
Published: MDPI AG 2024-01-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/13/2/307