End-to-End Mispronunciation Detection and Diagnosis Using Transfer Learning

As an indispensable module of computer-aided pronunciation training (CAPT) systems, mispronunciation detection and diagnosis (MDD) techniques have attracted a lot of attention from academia and industry over the past decade. To train robust MDD models, this technique requires massive human-annotated...

Full description

Bibliographic Details
Main Authors:	Linkai Peng, Yingming Gao, Rian Bao, Ya Li, Jinsong Zhang
Format:	Article
Language:	English
Published:	MDPI AG 2023-06-01
Series:	Applied Sciences
Subjects:	mispronunciation detection and diagnosis (MDD) computer-aided pronunciation training (CAPT) transfer learning pretrained model text modulation gate
Online Access:	https://www.mdpi.com/2076-3417/13/11/6793

_version_	1827739680855031808
author	Linkai Peng Yingming Gao Rian Bao Ya Li Jinsong Zhang
author_facet	Linkai Peng Yingming Gao Rian Bao Ya Li Jinsong Zhang
author_sort	Linkai Peng
collection	DOAJ
description	As an indispensable module of computer-aided pronunciation training (CAPT) systems, mispronunciation detection and diagnosis (MDD) techniques have attracted a lot of attention from academia and industry over the past decade. To train robust MDD models, this technique requires massive human-annotated speech recordings which are usually expensive and even hard to acquire. In this study, we propose to use transfer learning to tackle the problem of data scarcity from two aspects. First, from audio modality, we explore the use of the pretrained model wav2vec2.0 for MDD tasks by learning robust general acoustic representation. Second, from text modality, we explore transferring prior texts into MDD by learning associations between acoustic and textual modalities. We propose textual modulation gates that assign more importance to the relevant text information while suppressing irrelevant text information. Moreover, given the transcriptions, we propose an extra contrastive loss to reduce the difference of learning objectives between the phoneme recognition and MDD tasks. Conducting experiments on the L2-Arctic dataset showed that our wav2vec2.0 based models outperformed the conventional methods. The proposed textual modulation gate and contrastive loss further improved the F1-score by more than 2.88% and our best model achieved an F1-score of 61.75%.
first_indexed	2024-03-11T03:10:39Z
format	Article
id	doaj.art-8a65852fffec4f2ba8a8235923ddd845
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-11T03:10:39Z
publishDate	2023-06-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-8a65852fffec4f2ba8a8235923ddd8452023-11-18T07:36:48ZengMDPI AGApplied Sciences2076-34172023-06-011311679310.3390/app13116793End-to-End Mispronunciation Detection and Diagnosis Using Transfer LearningLinkai Peng0Yingming Gao1Rian Bao2Ya Li3Jinsong Zhang4School of Information Science, Beijing Language and Culture University, Beijing 100083, ChinaSchool of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, ChinaSchool of Information Science, Beijing Language and Culture University, Beijing 100083, ChinaSchool of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, ChinaSchool of Information Science, Beijing Language and Culture University, Beijing 100083, ChinaAs an indispensable module of computer-aided pronunciation training (CAPT) systems, mispronunciation detection and diagnosis (MDD) techniques have attracted a lot of attention from academia and industry over the past decade. To train robust MDD models, this technique requires massive human-annotated speech recordings which are usually expensive and even hard to acquire. In this study, we propose to use transfer learning to tackle the problem of data scarcity from two aspects. First, from audio modality, we explore the use of the pretrained model wav2vec2.0 for MDD tasks by learning robust general acoustic representation. Second, from text modality, we explore transferring prior texts into MDD by learning associations between acoustic and textual modalities. We propose textual modulation gates that assign more importance to the relevant text information while suppressing irrelevant text information. Moreover, given the transcriptions, we propose an extra contrastive loss to reduce the difference of learning objectives between the phoneme recognition and MDD tasks. Conducting experiments on the L2-Arctic dataset showed that our wav2vec2.0 based models outperformed the conventional methods. The proposed textual modulation gate and contrastive loss further improved the F1-score by more than 2.88% and our best model achieved an F1-score of 61.75%.https://www.mdpi.com/2076-3417/13/11/6793mispronunciation detection and diagnosis (MDD)computer-aided pronunciation training (CAPT)transfer learningpretrained modeltext modulation gate
spellingShingle	Linkai Peng Yingming Gao Rian Bao Ya Li Jinsong Zhang End-to-End Mispronunciation Detection and Diagnosis Using Transfer Learning Applied Sciences mispronunciation detection and diagnosis (MDD) computer-aided pronunciation training (CAPT) transfer learning pretrained model text modulation gate
title	End-to-End Mispronunciation Detection and Diagnosis Using Transfer Learning
title_full	End-to-End Mispronunciation Detection and Diagnosis Using Transfer Learning
title_fullStr	End-to-End Mispronunciation Detection and Diagnosis Using Transfer Learning
title_full_unstemmed	End-to-End Mispronunciation Detection and Diagnosis Using Transfer Learning
title_short	End-to-End Mispronunciation Detection and Diagnosis Using Transfer Learning
title_sort	end to end mispronunciation detection and diagnosis using transfer learning
topic	mispronunciation detection and diagnosis (MDD) computer-aided pronunciation training (CAPT) transfer learning pretrained model text modulation gate
url	https://www.mdpi.com/2076-3417/13/11/6793
work_keys_str_mv	AT linkaipeng endtoendmispronunciationdetectionanddiagnosisusingtransferlearning AT yingminggao endtoendmispronunciationdetectionanddiagnosisusingtransferlearning AT rianbao endtoendmispronunciationdetectionanddiagnosisusingtransferlearning AT yali endtoendmispronunciationdetectionanddiagnosisusingtransferlearning AT jinsongzhang endtoendmispronunciationdetectionanddiagnosisusingtransferlearning

End-to-End Mispronunciation Detection and Diagnosis Using Transfer Learning

Similar Items