Non-Autoregressive End-to-End Neural Modeling for Automatic Pronunciation Error Detection
A crucial element of computer-assisted pronunciation training systems (CAPT) is the mispronunciation detection and diagnostic (MDD) technique. The provided transcriptions can act as a teacher when evaluating the pronunciation quality of finite speech. The preceding texts have been entirely employed...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-12-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/13/1/109 |
_version_ | 1797626323795968000 |
---|---|
author | Md. Anwar Hussen Wadud Mohammed Alatiyyah M. F. Mridha |
author_facet | Md. Anwar Hussen Wadud Mohammed Alatiyyah M. F. Mridha |
author_sort | Md. Anwar Hussen Wadud |
collection | DOAJ |
description | A crucial element of computer-assisted pronunciation training systems (CAPT) is the mispronunciation detection and diagnostic (MDD) technique. The provided transcriptions can act as a teacher when evaluating the pronunciation quality of finite speech. The preceding texts have been entirely employed by conventional approaches, such as forced alignment and extended recognition networks, for model development or for enhancing system performance. The incorporation of earlier texts into model training has recently been attempted using end-to-end (E2E)-based approaches, and preliminary results indicate efficacy. Attention-based end-to-end models have shown lower speech recognition performance because multi-pass left-to-right forward computation constrains their practical applicability in beam search. In addition, end-to-end neural approaches are typically data-hungry, and a lack of non-native training data will frequently impair their effectiveness in MDD. To solve this problem, we provide a unique MDD technique that uses non-autoregressive (NAR) end-to-end neural models to greatly reduce estimation time while maintaining accuracy levels similar to traditional E2E neural models. In contrast, NAR models can generate parallel token sequences by accepting parallel inputs instead of left-to-right forward computation. To further enhance the effectiveness of MDD, we develop and construct a pronunciation model superimposed on our approach’s NAR end-to-end models. To test the effectiveness of our strategy against some of the best end-to-end models, we use publicly accessible L2-ARCTIC and SpeechOcean English datasets for training and testing purposes where the proposed model shows the best results than other existing models. |
first_indexed | 2024-03-11T10:08:47Z |
format | Article |
id | doaj.art-5430f159229a4aa8941a473fec86f81a |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-11T10:08:47Z |
publishDate | 2022-12-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-5430f159229a4aa8941a473fec86f81a2023-11-16T14:50:45ZengMDPI AGApplied Sciences2076-34172022-12-0113110910.3390/app13010109Non-Autoregressive End-to-End Neural Modeling for Automatic Pronunciation Error DetectionMd. Anwar Hussen Wadud0Mohammed Alatiyyah1M. F. Mridha2Department of Computer Science and Engineering, Bangladesh University of Business and Technology, Dhaka 1216, BangladeshDepartment of Computer Science, College of Sciences and Humanities in Aflaj, Prince Sattam Bin Abdulaziz University, Al-Kharj 16278, Saudi ArabiaDepartment of Computer Science & Engineering, American International University of Bangladesh, Dhaka 1216, BangladeshA crucial element of computer-assisted pronunciation training systems (CAPT) is the mispronunciation detection and diagnostic (MDD) technique. The provided transcriptions can act as a teacher when evaluating the pronunciation quality of finite speech. The preceding texts have been entirely employed by conventional approaches, such as forced alignment and extended recognition networks, for model development or for enhancing system performance. The incorporation of earlier texts into model training has recently been attempted using end-to-end (E2E)-based approaches, and preliminary results indicate efficacy. Attention-based end-to-end models have shown lower speech recognition performance because multi-pass left-to-right forward computation constrains their practical applicability in beam search. In addition, end-to-end neural approaches are typically data-hungry, and a lack of non-native training data will frequently impair their effectiveness in MDD. To solve this problem, we provide a unique MDD technique that uses non-autoregressive (NAR) end-to-end neural models to greatly reduce estimation time while maintaining accuracy levels similar to traditional E2E neural models. In contrast, NAR models can generate parallel token sequences by accepting parallel inputs instead of left-to-right forward computation. To further enhance the effectiveness of MDD, we develop and construct a pronunciation model superimposed on our approach’s NAR end-to-end models. To test the effectiveness of our strategy against some of the best end-to-end models, we use publicly accessible L2-ARCTIC and SpeechOcean English datasets for training and testing purposes where the proposed model shows the best results than other existing models.https://www.mdpi.com/2076-3417/13/1/109non-autoregressivepronunciation modelingspeech recognitionmispronunciation detection and diagnosisattentioncomputer-assisted pronunciation training (CAPT) |
spellingShingle | Md. Anwar Hussen Wadud Mohammed Alatiyyah M. F. Mridha Non-Autoregressive End-to-End Neural Modeling for Automatic Pronunciation Error Detection Applied Sciences non-autoregressive pronunciation modeling speech recognition mispronunciation detection and diagnosis attention computer-assisted pronunciation training (CAPT) |
title | Non-Autoregressive End-to-End Neural Modeling for Automatic Pronunciation Error Detection |
title_full | Non-Autoregressive End-to-End Neural Modeling for Automatic Pronunciation Error Detection |
title_fullStr | Non-Autoregressive End-to-End Neural Modeling for Automatic Pronunciation Error Detection |
title_full_unstemmed | Non-Autoregressive End-to-End Neural Modeling for Automatic Pronunciation Error Detection |
title_short | Non-Autoregressive End-to-End Neural Modeling for Automatic Pronunciation Error Detection |
title_sort | non autoregressive end to end neural modeling for automatic pronunciation error detection |
topic | non-autoregressive pronunciation modeling speech recognition mispronunciation detection and diagnosis attention computer-assisted pronunciation training (CAPT) |
url | https://www.mdpi.com/2076-3417/13/1/109 |
work_keys_str_mv | AT mdanwarhussenwadud nonautoregressiveendtoendneuralmodelingforautomaticpronunciationerrordetection AT mohammedalatiyyah nonautoregressiveendtoendneuralmodelingforautomaticpronunciationerrordetection AT mfmridha nonautoregressiveendtoendneuralmodelingforautomaticpronunciationerrordetection |