Multi-Scale Feature Learning for Language Identification of Overlapped Speech

Language identification is the front end of multilingual speech-processing tasks. The study aims to enhance the accuracy of language identification in complex acoustic environments by proposing a multi-scale feature extraction method. This method replaces the baseline feature extraction network with...

Full description

Bibliographic Details
Main Authors: Zuhragvl Aysa, Mijit Ablimit, Askar Hamdulla
Format: Article
Language:English
Published: MDPI AG 2023-03-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/13/7/4235
Description
Summary:Language identification is the front end of multilingual speech-processing tasks. The study aims to enhance the accuracy of language identification in complex acoustic environments by proposing a multi-scale feature extraction method. This method replaces the baseline feature extraction network with a multi-scale feature extraction network (SE-Res2Net-CBAM-BILSTM) to extract multi-scale features. A multilingual cocktail party dataset was simulated, and comparative experiments were conducted with various models. The experimental results show that the proposed model achieved language identification accuracies of 97.6% for an Oriental language dataset and 75% for a multilingual cocktail party dataset Furthermore, comparative experiments show that our model outperformed three other models in the accuracy, recall, and F1 values. Finally, a comparison of different loss functions shows that the model performance was better when using focal loss.
ISSN:2076-3417