Multi-Scale Feature Learning for Language Identification of Overlapped Speech

Language identification is the front end of multilingual speech-processing tasks. The study aims to enhance the accuracy of language identification in complex acoustic environments by proposing a multi-scale feature extraction method. This method replaces the baseline feature extraction network with...

Full description

Bibliographic Details
Main Authors:	Zuhragvl Aysa, Mijit Ablimit, Askar Hamdulla
Format:	Article
Language:	English
Published:	MDPI AG 2023-03-01
Series:	Applied Sciences
Subjects:	language identification spectrogram overlapped speech CNN CBAM SE-Res2Net
Online Access:	https://www.mdpi.com/2076-3417/13/7/4235

Description
Summary:	Language identification is the front end of multilingual speech-processing tasks. The study aims to enhance the accuracy of language identification in complex acoustic environments by proposing a multi-scale feature extraction method. This method replaces the baseline feature extraction network with a multi-scale feature extraction network (SE-Res2Net-CBAM-BILSTM) to extract multi-scale features. A multilingual cocktail party dataset was simulated, and comparative experiments were conducted with various models. The experimental results show that the proposed model achieved language identification accuracies of 97.6% for an Oriental language dataset and 75% for a multilingual cocktail party dataset Furthermore, comparative experiments show that our model outperformed three other models in the accuracy, recall, and F1 values. Finally, a comparison of different loss functions shows that the model performance was better when using focal loss.
ISSN:	2076-3417

Multi-Scale Feature Learning for Language Identification of Overlapped Speech

Similar Items