Improved spell corrector algorithm and deepspeech2 model for enhancing end-to-end Gujarati language ASR performance

Automatic Speech Recognition (ASR) is the process of converting auditory signals into text representations of spoken words. In recent years, advancements in deep learning algorithms have resulted in the development of intricate architectures that considerably enhance the efficacy of End-to-End (E2E)...

Full description

Bibliographic Details
Main Authors: Bhavesh Bhagat, Mohit Dua
Format: Article
Language:English
Published: Elsevier 2024-03-01
Series:e-Prime: Advances in Electrical Engineering, Electronics and Energy
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2772671124000238
_version_ 1797256257286963200
author Bhavesh Bhagat
Mohit Dua
author_facet Bhavesh Bhagat
Mohit Dua
author_sort Bhavesh Bhagat
collection DOAJ
description Automatic Speech Recognition (ASR) is the process of converting auditory signals into text representations of spoken words. In recent years, advancements in deep learning algorithms have resulted in the development of intricate architectures that considerably enhance the efficacy of End-to-End (E2E) ASR systems. Obtaining significant quantities of training data can be difficult, particularly for languages with limited resources, such as Gujarati. This article describes a novel method for improving ASR performance without the need for additional training data. The proposed method combines an enhanced orthography corrector algorithm with a DeepSpeech2 model architecture that employs Bidirectional Encoder Representations from Transformers and Gated Recurrent Units. Existing decoding strategies, such as greedy or prefix beam search, are improved upon by the algorithm used in this work. It employs post-processing techniques designed specifically for Gujarati language modifications. To train the model, high-quality, multi-speaker (male and female) Gujarati voice data has been gathered via crowd-sourcing, assuring that the most optimal parameter values are used. Word Error Rate (WER) has been reduced by a remarkable 17.20 % across the board. In addition, the study investigates various analytic techniques for identifying errors resulting from diacritics, consonants, independents, homophones, and half-conjugates. The overall efficacy of the ASR system is improved by obtaining a deeper understanding of the Gujarati language and implementing these techniques.
first_indexed 2024-03-08T11:23:26Z
format Article
id doaj.art-233016ed0e604a37aecdca402566bc01
institution Directory Open Access Journal
issn 2772-6711
language English
last_indexed 2024-04-24T22:18:52Z
publishDate 2024-03-01
publisher Elsevier
record_format Article
series e-Prime: Advances in Electrical Engineering, Electronics and Energy
spelling doaj.art-233016ed0e604a37aecdca402566bc012024-03-20T06:11:53ZengElseviere-Prime: Advances in Electrical Engineering, Electronics and Energy2772-67112024-03-017100441Improved spell corrector algorithm and deepspeech2 model for enhancing end-to-end Gujarati language ASR performanceBhavesh Bhagat0Mohit Dua1Corresponding author.; Department of Computer Engineering, National Institute of Technology, Kurukshetra, IndiaDepartment of Computer Engineering, National Institute of Technology, Kurukshetra, IndiaAutomatic Speech Recognition (ASR) is the process of converting auditory signals into text representations of spoken words. In recent years, advancements in deep learning algorithms have resulted in the development of intricate architectures that considerably enhance the efficacy of End-to-End (E2E) ASR systems. Obtaining significant quantities of training data can be difficult, particularly for languages with limited resources, such as Gujarati. This article describes a novel method for improving ASR performance without the need for additional training data. The proposed method combines an enhanced orthography corrector algorithm with a DeepSpeech2 model architecture that employs Bidirectional Encoder Representations from Transformers and Gated Recurrent Units. Existing decoding strategies, such as greedy or prefix beam search, are improved upon by the algorithm used in this work. It employs post-processing techniques designed specifically for Gujarati language modifications. To train the model, high-quality, multi-speaker (male and female) Gujarati voice data has been gathered via crowd-sourcing, assuring that the most optimal parameter values are used. Word Error Rate (WER) has been reduced by a remarkable 17.20 % across the board. In addition, the study investigates various analytic techniques for identifying errors resulting from diacritics, consonants, independents, homophones, and half-conjugates. The overall efficacy of the ASR system is improved by obtaining a deeper understanding of the Gujarati language and implementing these techniques.http://www.sciencedirect.com/science/article/pii/S2772671124000238ASRDeepSpeech2 modelBERTGreedy or prefix beam search decodingE2ESpell corrector
spellingShingle Bhavesh Bhagat
Mohit Dua
Improved spell corrector algorithm and deepspeech2 model for enhancing end-to-end Gujarati language ASR performance
e-Prime: Advances in Electrical Engineering, Electronics and Energy
ASR
DeepSpeech2 model
BERT
Greedy or prefix beam search decoding
E2E
Spell corrector
title Improved spell corrector algorithm and deepspeech2 model for enhancing end-to-end Gujarati language ASR performance
title_full Improved spell corrector algorithm and deepspeech2 model for enhancing end-to-end Gujarati language ASR performance
title_fullStr Improved spell corrector algorithm and deepspeech2 model for enhancing end-to-end Gujarati language ASR performance
title_full_unstemmed Improved spell corrector algorithm and deepspeech2 model for enhancing end-to-end Gujarati language ASR performance
title_short Improved spell corrector algorithm and deepspeech2 model for enhancing end-to-end Gujarati language ASR performance
title_sort improved spell corrector algorithm and deepspeech2 model for enhancing end to end gujarati language asr performance
topic ASR
DeepSpeech2 model
BERT
Greedy or prefix beam search decoding
E2E
Spell corrector
url http://www.sciencedirect.com/science/article/pii/S2772671124000238
work_keys_str_mv AT bhaveshbhagat improvedspellcorrectoralgorithmanddeepspeech2modelforenhancingendtoendgujaratilanguageasrperformance
AT mohitdua improvedspellcorrectoralgorithmanddeepspeech2modelforenhancingendtoendgujaratilanguageasrperformance