Non-Autoregressive End-to-End Neural Modeling for Automatic Pronunciation Error Detection

A crucial element of computer-assisted pronunciation training systems (CAPT) is the mispronunciation detection and diagnostic (MDD) technique. The provided transcriptions can act as a teacher when evaluating the pronunciation quality of finite speech. The preceding texts have been entirely employed...

Full description

Bibliographic Details
Main Authors: Md. Anwar Hussen Wadud, Mohammed Alatiyyah, M. F. Mridha
Format: Article
Language:English
Published: MDPI AG 2022-12-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/13/1/109
_version_ 1797626323795968000
author Md. Anwar Hussen Wadud
Mohammed Alatiyyah
M. F. Mridha
author_facet Md. Anwar Hussen Wadud
Mohammed Alatiyyah
M. F. Mridha
author_sort Md. Anwar Hussen Wadud
collection DOAJ
description A crucial element of computer-assisted pronunciation training systems (CAPT) is the mispronunciation detection and diagnostic (MDD) technique. The provided transcriptions can act as a teacher when evaluating the pronunciation quality of finite speech. The preceding texts have been entirely employed by conventional approaches, such as forced alignment and extended recognition networks, for model development or for enhancing system performance. The incorporation of earlier texts into model training has recently been attempted using end-to-end (E2E)-based approaches, and preliminary results indicate efficacy. Attention-based end-to-end models have shown lower speech recognition performance because multi-pass left-to-right forward computation constrains their practical applicability in beam search. In addition, end-to-end neural approaches are typically data-hungry, and a lack of non-native training data will frequently impair their effectiveness in MDD. To solve this problem, we provide a unique MDD technique that uses non-autoregressive (NAR) end-to-end neural models to greatly reduce estimation time while maintaining accuracy levels similar to traditional E2E neural models. In contrast, NAR models can generate parallel token sequences by accepting parallel inputs instead of left-to-right forward computation. To further enhance the effectiveness of MDD, we develop and construct a pronunciation model superimposed on our approach’s NAR end-to-end models. To test the effectiveness of our strategy against some of the best end-to-end models, we use publicly accessible L2-ARCTIC and SpeechOcean English datasets for training and testing purposes where the proposed model shows the best results than other existing models.
first_indexed 2024-03-11T10:08:47Z
format Article
id doaj.art-5430f159229a4aa8941a473fec86f81a
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-11T10:08:47Z
publishDate 2022-12-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-5430f159229a4aa8941a473fec86f81a2023-11-16T14:50:45ZengMDPI AGApplied Sciences2076-34172022-12-0113110910.3390/app13010109Non-Autoregressive End-to-End Neural Modeling for Automatic Pronunciation Error DetectionMd. Anwar Hussen Wadud0Mohammed Alatiyyah1M. F. Mridha2Department of Computer Science and Engineering, Bangladesh University of Business and Technology, Dhaka 1216, BangladeshDepartment of Computer Science, College of Sciences and Humanities in Aflaj, Prince Sattam Bin Abdulaziz University, Al-Kharj 16278, Saudi ArabiaDepartment of Computer Science & Engineering, American International University of Bangladesh, Dhaka 1216, BangladeshA crucial element of computer-assisted pronunciation training systems (CAPT) is the mispronunciation detection and diagnostic (MDD) technique. The provided transcriptions can act as a teacher when evaluating the pronunciation quality of finite speech. The preceding texts have been entirely employed by conventional approaches, such as forced alignment and extended recognition networks, for model development or for enhancing system performance. The incorporation of earlier texts into model training has recently been attempted using end-to-end (E2E)-based approaches, and preliminary results indicate efficacy. Attention-based end-to-end models have shown lower speech recognition performance because multi-pass left-to-right forward computation constrains their practical applicability in beam search. In addition, end-to-end neural approaches are typically data-hungry, and a lack of non-native training data will frequently impair their effectiveness in MDD. To solve this problem, we provide a unique MDD technique that uses non-autoregressive (NAR) end-to-end neural models to greatly reduce estimation time while maintaining accuracy levels similar to traditional E2E neural models. In contrast, NAR models can generate parallel token sequences by accepting parallel inputs instead of left-to-right forward computation. To further enhance the effectiveness of MDD, we develop and construct a pronunciation model superimposed on our approach’s NAR end-to-end models. To test the effectiveness of our strategy against some of the best end-to-end models, we use publicly accessible L2-ARCTIC and SpeechOcean English datasets for training and testing purposes where the proposed model shows the best results than other existing models.https://www.mdpi.com/2076-3417/13/1/109non-autoregressivepronunciation modelingspeech recognitionmispronunciation detection and diagnosisattentioncomputer-assisted pronunciation training (CAPT)
spellingShingle Md. Anwar Hussen Wadud
Mohammed Alatiyyah
M. F. Mridha
Non-Autoregressive End-to-End Neural Modeling for Automatic Pronunciation Error Detection
Applied Sciences
non-autoregressive
pronunciation modeling
speech recognition
mispronunciation detection and diagnosis
attention
computer-assisted pronunciation training (CAPT)
title Non-Autoregressive End-to-End Neural Modeling for Automatic Pronunciation Error Detection
title_full Non-Autoregressive End-to-End Neural Modeling for Automatic Pronunciation Error Detection
title_fullStr Non-Autoregressive End-to-End Neural Modeling for Automatic Pronunciation Error Detection
title_full_unstemmed Non-Autoregressive End-to-End Neural Modeling for Automatic Pronunciation Error Detection
title_short Non-Autoregressive End-to-End Neural Modeling for Automatic Pronunciation Error Detection
title_sort non autoregressive end to end neural modeling for automatic pronunciation error detection
topic non-autoregressive
pronunciation modeling
speech recognition
mispronunciation detection and diagnosis
attention
computer-assisted pronunciation training (CAPT)
url https://www.mdpi.com/2076-3417/13/1/109
work_keys_str_mv AT mdanwarhussenwadud nonautoregressiveendtoendneuralmodelingforautomaticpronunciationerrordetection
AT mohammedalatiyyah nonautoregressiveendtoendneuralmodelingforautomaticpronunciationerrordetection
AT mfmridha nonautoregressiveendtoendneuralmodelingforautomaticpronunciationerrordetection