Improved spell corrector algorithm and deepspeech2 model for enhancing end-to-end Gujarati language ASR performance
Automatic Speech Recognition (ASR) is the process of converting auditory signals into text representations of spoken words. In recent years, advancements in deep learning algorithms have resulted in the development of intricate architectures that considerably enhance the efficacy of End-to-End (E2E)...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2024-03-01
|
Series: | e-Prime: Advances in Electrical Engineering, Electronics and Energy |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2772671124000238 |
_version_ | 1797256257286963200 |
---|---|
author | Bhavesh Bhagat Mohit Dua |
author_facet | Bhavesh Bhagat Mohit Dua |
author_sort | Bhavesh Bhagat |
collection | DOAJ |
description | Automatic Speech Recognition (ASR) is the process of converting auditory signals into text representations of spoken words. In recent years, advancements in deep learning algorithms have resulted in the development of intricate architectures that considerably enhance the efficacy of End-to-End (E2E) ASR systems. Obtaining significant quantities of training data can be difficult, particularly for languages with limited resources, such as Gujarati. This article describes a novel method for improving ASR performance without the need for additional training data. The proposed method combines an enhanced orthography corrector algorithm with a DeepSpeech2 model architecture that employs Bidirectional Encoder Representations from Transformers and Gated Recurrent Units. Existing decoding strategies, such as greedy or prefix beam search, are improved upon by the algorithm used in this work. It employs post-processing techniques designed specifically for Gujarati language modifications. To train the model, high-quality, multi-speaker (male and female) Gujarati voice data has been gathered via crowd-sourcing, assuring that the most optimal parameter values are used. Word Error Rate (WER) has been reduced by a remarkable 17.20 % across the board. In addition, the study investigates various analytic techniques for identifying errors resulting from diacritics, consonants, independents, homophones, and half-conjugates. The overall efficacy of the ASR system is improved by obtaining a deeper understanding of the Gujarati language and implementing these techniques. |
first_indexed | 2024-03-08T11:23:26Z |
format | Article |
id | doaj.art-233016ed0e604a37aecdca402566bc01 |
institution | Directory Open Access Journal |
issn | 2772-6711 |
language | English |
last_indexed | 2024-04-24T22:18:52Z |
publishDate | 2024-03-01 |
publisher | Elsevier |
record_format | Article |
series | e-Prime: Advances in Electrical Engineering, Electronics and Energy |
spelling | doaj.art-233016ed0e604a37aecdca402566bc012024-03-20T06:11:53ZengElseviere-Prime: Advances in Electrical Engineering, Electronics and Energy2772-67112024-03-017100441Improved spell corrector algorithm and deepspeech2 model for enhancing end-to-end Gujarati language ASR performanceBhavesh Bhagat0Mohit Dua1Corresponding author.; Department of Computer Engineering, National Institute of Technology, Kurukshetra, IndiaDepartment of Computer Engineering, National Institute of Technology, Kurukshetra, IndiaAutomatic Speech Recognition (ASR) is the process of converting auditory signals into text representations of spoken words. In recent years, advancements in deep learning algorithms have resulted in the development of intricate architectures that considerably enhance the efficacy of End-to-End (E2E) ASR systems. Obtaining significant quantities of training data can be difficult, particularly for languages with limited resources, such as Gujarati. This article describes a novel method for improving ASR performance without the need for additional training data. The proposed method combines an enhanced orthography corrector algorithm with a DeepSpeech2 model architecture that employs Bidirectional Encoder Representations from Transformers and Gated Recurrent Units. Existing decoding strategies, such as greedy or prefix beam search, are improved upon by the algorithm used in this work. It employs post-processing techniques designed specifically for Gujarati language modifications. To train the model, high-quality, multi-speaker (male and female) Gujarati voice data has been gathered via crowd-sourcing, assuring that the most optimal parameter values are used. Word Error Rate (WER) has been reduced by a remarkable 17.20 % across the board. In addition, the study investigates various analytic techniques for identifying errors resulting from diacritics, consonants, independents, homophones, and half-conjugates. The overall efficacy of the ASR system is improved by obtaining a deeper understanding of the Gujarati language and implementing these techniques.http://www.sciencedirect.com/science/article/pii/S2772671124000238ASRDeepSpeech2 modelBERTGreedy or prefix beam search decodingE2ESpell corrector |
spellingShingle | Bhavesh Bhagat Mohit Dua Improved spell corrector algorithm and deepspeech2 model for enhancing end-to-end Gujarati language ASR performance e-Prime: Advances in Electrical Engineering, Electronics and Energy ASR DeepSpeech2 model BERT Greedy or prefix beam search decoding E2E Spell corrector |
title | Improved spell corrector algorithm and deepspeech2 model for enhancing end-to-end Gujarati language ASR performance |
title_full | Improved spell corrector algorithm and deepspeech2 model for enhancing end-to-end Gujarati language ASR performance |
title_fullStr | Improved spell corrector algorithm and deepspeech2 model for enhancing end-to-end Gujarati language ASR performance |
title_full_unstemmed | Improved spell corrector algorithm and deepspeech2 model for enhancing end-to-end Gujarati language ASR performance |
title_short | Improved spell corrector algorithm and deepspeech2 model for enhancing end-to-end Gujarati language ASR performance |
title_sort | improved spell corrector algorithm and deepspeech2 model for enhancing end to end gujarati language asr performance |
topic | ASR DeepSpeech2 model BERT Greedy or prefix beam search decoding E2E Spell corrector |
url | http://www.sciencedirect.com/science/article/pii/S2772671124000238 |
work_keys_str_mv | AT bhaveshbhagat improvedspellcorrectoralgorithmanddeepspeech2modelforenhancingendtoendgujaratilanguageasrperformance AT mohitdua improvedspellcorrectoralgorithmanddeepspeech2modelforenhancingendtoendgujaratilanguageasrperformance |