Improved spell corrector algorithm and deepspeech2 model for enhancing end-to-end Gujarati language ASR performance

Automatic Speech Recognition (ASR) is the process of converting auditory signals into text representations of spoken words. In recent years, advancements in deep learning algorithms have resulted in the development of intricate architectures that considerably enhance the efficacy of End-to-End (E2E)...

Full description

Bibliographic Details
Main Authors:	Bhavesh Bhagat, Mohit Dua
Format:	Article
Language:	English
Published:	Elsevier 2024-03-01
Series:	e-Prime: Advances in Electrical Engineering, Electronics and Energy
Subjects:	ASR DeepSpeech2 model BERT Greedy or prefix beam search decoding E2E Spell corrector
Online Access:	http://www.sciencedirect.com/science/article/pii/S2772671124000238

_version_	1797256257286963200
author	Bhavesh Bhagat Mohit Dua
author_facet	Bhavesh Bhagat Mohit Dua
author_sort	Bhavesh Bhagat
collection	DOAJ
description	Automatic Speech Recognition (ASR) is the process of converting auditory signals into text representations of spoken words. In recent years, advancements in deep learning algorithms have resulted in the development of intricate architectures that considerably enhance the efficacy of End-to-End (E2E) ASR systems. Obtaining significant quantities of training data can be difficult, particularly for languages with limited resources, such as Gujarati. This article describes a novel method for improving ASR performance without the need for additional training data. The proposed method combines an enhanced orthography corrector algorithm with a DeepSpeech2 model architecture that employs Bidirectional Encoder Representations from Transformers and Gated Recurrent Units. Existing decoding strategies, such as greedy or prefix beam search, are improved upon by the algorithm used in this work. It employs post-processing techniques designed specifically for Gujarati language modifications. To train the model, high-quality, multi-speaker (male and female) Gujarati voice data has been gathered via crowd-sourcing, assuring that the most optimal parameter values are used. Word Error Rate (WER) has been reduced by a remarkable 17.20 % across the board. In addition, the study investigates various analytic techniques for identifying errors resulting from diacritics, consonants, independents, homophones, and half-conjugates. The overall efficacy of the ASR system is improved by obtaining a deeper understanding of the Gujarati language and implementing these techniques.
first_indexed	2024-03-08T11:23:26Z
format	Article
id	doaj.art-233016ed0e604a37aecdca402566bc01
institution	Directory Open Access Journal
issn	2772-6711
language	English
last_indexed	2024-04-24T22:18:52Z
publishDate	2024-03-01
publisher	Elsevier
record_format	Article
series	e-Prime: Advances in Electrical Engineering, Electronics and Energy
spelling	doaj.art-233016ed0e604a37aecdca402566bc012024-03-20T06:11:53ZengElseviere-Prime: Advances in Electrical Engineering, Electronics and Energy2772-67112024-03-017100441Improved spell corrector algorithm and deepspeech2 model for enhancing end-to-end Gujarati language ASR performanceBhavesh Bhagat0Mohit Dua1Corresponding author.; Department of Computer Engineering, National Institute of Technology, Kurukshetra, IndiaDepartment of Computer Engineering, National Institute of Technology, Kurukshetra, IndiaAutomatic Speech Recognition (ASR) is the process of converting auditory signals into text representations of spoken words. In recent years, advancements in deep learning algorithms have resulted in the development of intricate architectures that considerably enhance the efficacy of End-to-End (E2E) ASR systems. Obtaining significant quantities of training data can be difficult, particularly for languages with limited resources, such as Gujarati. This article describes a novel method for improving ASR performance without the need for additional training data. The proposed method combines an enhanced orthography corrector algorithm with a DeepSpeech2 model architecture that employs Bidirectional Encoder Representations from Transformers and Gated Recurrent Units. Existing decoding strategies, such as greedy or prefix beam search, are improved upon by the algorithm used in this work. It employs post-processing techniques designed specifically for Gujarati language modifications. To train the model, high-quality, multi-speaker (male and female) Gujarati voice data has been gathered via crowd-sourcing, assuring that the most optimal parameter values are used. Word Error Rate (WER) has been reduced by a remarkable 17.20 % across the board. In addition, the study investigates various analytic techniques for identifying errors resulting from diacritics, consonants, independents, homophones, and half-conjugates. The overall efficacy of the ASR system is improved by obtaining a deeper understanding of the Gujarati language and implementing these techniques.http://www.sciencedirect.com/science/article/pii/S2772671124000238ASRDeepSpeech2 modelBERTGreedy or prefix beam search decodingE2ESpell corrector
spellingShingle	Bhavesh Bhagat Mohit Dua Improved spell corrector algorithm and deepspeech2 model for enhancing end-to-end Gujarati language ASR performance e-Prime: Advances in Electrical Engineering, Electronics and Energy ASR DeepSpeech2 model BERT Greedy or prefix beam search decoding E2E Spell corrector
title	Improved spell corrector algorithm and deepspeech2 model for enhancing end-to-end Gujarati language ASR performance
title_full	Improved spell corrector algorithm and deepspeech2 model for enhancing end-to-end Gujarati language ASR performance
title_fullStr	Improved spell corrector algorithm and deepspeech2 model for enhancing end-to-end Gujarati language ASR performance
title_full_unstemmed	Improved spell corrector algorithm and deepspeech2 model for enhancing end-to-end Gujarati language ASR performance
title_short	Improved spell corrector algorithm and deepspeech2 model for enhancing end-to-end Gujarati language ASR performance
title_sort	improved spell corrector algorithm and deepspeech2 model for enhancing end to end gujarati language asr performance
topic	ASR DeepSpeech2 model BERT Greedy or prefix beam search decoding E2E Spell corrector
url	http://www.sciencedirect.com/science/article/pii/S2772671124000238
work_keys_str_mv	AT bhaveshbhagat improvedspellcorrectoralgorithmanddeepspeech2modelforenhancingendtoendgujaratilanguageasrperformance AT mohitdua improvedspellcorrectoralgorithmanddeepspeech2modelforenhancingendtoendgujaratilanguageasrperformance

Improved spell corrector algorithm and deepspeech2 model for enhancing end-to-end Gujarati language ASR performance

Similar Items