Improving Amharic Speech Recognition System Using Connectionist Temporal Classification with Attention Model and Phoneme-Based Byte-Pair-Encodings

Out-of-vocabulary (OOV) words are the most challenging problem in automatic speech recognition (ASR), especially for morphologically rich languages. Most end-to-end speech recognition systems are performed at word and character levels of a language. Amharic is a poorly resourced but morphologically...

Full description

Bibliographic Details
Main Authors:	Eshete Derb Emiru, Shengwu Xiong, Yaxing Li, Awet Fesseha, Moussa Diallo
Format:	Article
Language:	English
Published:	MDPI AG 2021-02-01
Series:	Information
Subjects:	Amharic automatic speech recognition connectionist temporal classification with attention natural language processing low resource language out-of-vocabulary
Online Access:	https://www.mdpi.com/2078-2489/12/2/62

_version_	1827604337316069376
author	Eshete Derb Emiru Shengwu Xiong Yaxing Li Awet Fesseha Moussa Diallo
author_facet	Eshete Derb Emiru Shengwu Xiong Yaxing Li Awet Fesseha Moussa Diallo
author_sort	Eshete Derb Emiru
collection	DOAJ
description	Out-of-vocabulary (OOV) words are the most challenging problem in automatic speech recognition (ASR), especially for morphologically rich languages. Most end-to-end speech recognition systems are performed at word and character levels of a language. Amharic is a poorly resourced but morphologically rich language. This paper proposes hybrid connectionist temporal classification with attention end-to-end architecture and a syllabification algorithm for Amharic automatic speech recognition system (AASR) using its phoneme-based subword units. This algorithm helps to insert the epithetic vowel እ[ɨ], which is not included in our Grapheme-to-Phoneme (G2P) conversion algorithm developed using consonant–vowel (CV) representations of Amharic graphemes. The proposed end-to-end model was trained in various Amharic subwords, namely characters, phonemes, character-based subwords, and phoneme-based subwords generated by the byte-pair-encoding (BPE) segmentation algorithm. Experimental results showed that context-dependent phoneme-based subwords tend to result in more accurate speech recognition systems than the character-based, phoneme-based, and character-based subword counterparts. Further improvement was also obtained in proposed phoneme-based subwords with the syllabification algorithm and SpecAugment data augmentation technique. The word error rate (WER) reduction was 18.38% compared to character-based acoustic modeling with the word-based recurrent neural network language modeling (RNNLM) baseline. These phoneme-based subword models are also useful to improve machine and speech translation tasks.
first_indexed	2024-03-09T05:58:38Z
format	Article
id	doaj.art-c5dea3e50f95403bbe981c2f464cb5a8
institution	Directory Open Access Journal
issn	2078-2489
language	English
last_indexed	2024-03-09T05:58:38Z
publishDate	2021-02-01
publisher	MDPI AG
record_format	Article
series	Information
spelling	doaj.art-c5dea3e50f95403bbe981c2f464cb5a82023-12-03T12:11:57ZengMDPI AGInformation2078-24892021-02-011226210.3390/info12020062Improving Amharic Speech Recognition System Using Connectionist Temporal Classification with Attention Model and Phoneme-Based Byte-Pair-EncodingsEshete Derb Emiru0Shengwu Xiong1Yaxing Li2Awet Fesseha3Moussa Diallo4School of Computer Science and Technology, Wuhan University of Technology, Wuhan 430070, ChinaSchool of Computer Science and Technology, Wuhan University of Technology, Wuhan 430070, ChinaSchool of Computer Science and Technology, Wuhan University of Technology, Wuhan 430070, ChinaSchool of Computer Science and Technology, Wuhan University of Technology, Wuhan 430070, ChinaSchool of Computer Science and Technology, Wuhan University of Technology, Wuhan 430070, ChinaOut-of-vocabulary (OOV) words are the most challenging problem in automatic speech recognition (ASR), especially for morphologically rich languages. Most end-to-end speech recognition systems are performed at word and character levels of a language. Amharic is a poorly resourced but morphologically rich language. This paper proposes hybrid connectionist temporal classification with attention end-to-end architecture and a syllabification algorithm for Amharic automatic speech recognition system (AASR) using its phoneme-based subword units. This algorithm helps to insert the epithetic vowel እ[ɨ], which is not included in our Grapheme-to-Phoneme (G2P) conversion algorithm developed using consonant–vowel (CV) representations of Amharic graphemes. The proposed end-to-end model was trained in various Amharic subwords, namely characters, phonemes, character-based subwords, and phoneme-based subwords generated by the byte-pair-encoding (BPE) segmentation algorithm. Experimental results showed that context-dependent phoneme-based subwords tend to result in more accurate speech recognition systems than the character-based, phoneme-based, and character-based subword counterparts. Further improvement was also obtained in proposed phoneme-based subwords with the syllabification algorithm and SpecAugment data augmentation technique. The word error rate (WER) reduction was 18.38% compared to character-based acoustic modeling with the word-based recurrent neural network language modeling (RNNLM) baseline. These phoneme-based subword models are also useful to improve machine and speech translation tasks.https://www.mdpi.com/2078-2489/12/2/62Amharicautomatic speech recognitionconnectionist temporal classification with attentionnatural language processinglow resource languageout-of-vocabulary
spellingShingle	Eshete Derb Emiru Shengwu Xiong Yaxing Li Awet Fesseha Moussa Diallo Improving Amharic Speech Recognition System Using Connectionist Temporal Classification with Attention Model and Phoneme-Based Byte-Pair-Encodings Information Amharic automatic speech recognition connectionist temporal classification with attention natural language processing low resource language out-of-vocabulary
title	Improving Amharic Speech Recognition System Using Connectionist Temporal Classification with Attention Model and Phoneme-Based Byte-Pair-Encodings
title_full	Improving Amharic Speech Recognition System Using Connectionist Temporal Classification with Attention Model and Phoneme-Based Byte-Pair-Encodings
title_fullStr	Improving Amharic Speech Recognition System Using Connectionist Temporal Classification with Attention Model and Phoneme-Based Byte-Pair-Encodings
title_full_unstemmed	Improving Amharic Speech Recognition System Using Connectionist Temporal Classification with Attention Model and Phoneme-Based Byte-Pair-Encodings
title_short	Improving Amharic Speech Recognition System Using Connectionist Temporal Classification with Attention Model and Phoneme-Based Byte-Pair-Encodings
title_sort	improving amharic speech recognition system using connectionist temporal classification with attention model and phoneme based byte pair encodings
topic	Amharic automatic speech recognition connectionist temporal classification with attention natural language processing low resource language out-of-vocabulary
url	https://www.mdpi.com/2078-2489/12/2/62
work_keys_str_mv	AT eshetederbemiru improvingamharicspeechrecognitionsystemusingconnectionisttemporalclassificationwithattentionmodelandphonemebasedbytepairencodings AT shengwuxiong improvingamharicspeechrecognitionsystemusingconnectionisttemporalclassificationwithattentionmodelandphonemebasedbytepairencodings AT yaxingli improvingamharicspeechrecognitionsystemusingconnectionisttemporalclassificationwithattentionmodelandphonemebasedbytepairencodings AT awetfesseha improvingamharicspeechrecognitionsystemusingconnectionisttemporalclassificationwithattentionmodelandphonemebasedbytepairencodings AT moussadiallo improvingamharicspeechrecognitionsystemusingconnectionisttemporalclassificationwithattentionmodelandphonemebasedbytepairencodings

Improving Amharic Speech Recognition System Using Connectionist Temporal Classification with Attention Model and Phoneme-Based Byte-Pair-Encodings

Similar Items