Automatic Speech Recognition Using Limited Vocabulary: A Survey

Automatic Speech Recognition (ASR) is an active field of research due to its large number of applications and the proliferation of interfaces or computing devices that can support speech processing. However, the bulk of applications are based on well-resourced languages that overshadow under-resourc...

Full description

Bibliographic Details
Main Authors:	Jean Louis K. E Fendji, Diane C. M. Tala, Blaise O. Yenke, Marcellin Atemkeng
Format:	Article
Language:	English
Published:	Taylor & Francis Group 2022-12-01
Series:	Applied Artificial Intelligence
Online Access:	http://dx.doi.org/10.1080/08839514.2022.2095039

_version_	1797641099670454272
author	Jean Louis K. E Fendji Diane C. M. Tala Blaise O. Yenke Marcellin Atemkeng
author_facet	Jean Louis K. E Fendji Diane C. M. Tala Blaise O. Yenke Marcellin Atemkeng
author_sort	Jean Louis K. E Fendji
collection	DOAJ
description	Automatic Speech Recognition (ASR) is an active field of research due to its large number of applications and the proliferation of interfaces or computing devices that can support speech processing. However, the bulk of applications are based on well-resourced languages that overshadow under-resourced ones. Yet, ASR represents an undeniable means to promote such languages, especially when designing human-to-human or human-to-machine systems involving illiterate people. An approach to design an ASR system targeting under-resourced languages is to start with a limited vocabulary. ASR using a limited vocabulary is a subset of the speech recognition problem that focuses on the recognition of a small number of words or sentences. This paper aims to provide a comprehensive view of mechanisms behind ASR systems as well as techniques, tools, projects, recent contributions, and possible future directions in ASR using a limited vocabulary. This work consequently provides a way forward when designing an ASR system using limited vocabulary. Although an emphasis is put on limited vocabulary, most of the tools and techniques reported in this survey can be applied to ASR systems in general. AbbreviationsACC: Accuracy; AM: Acoustic Model; ASR: Automatic Speech Recognition; BD-4SK-ASR: Basic Dataset for Sorani Kurdish Automatic Speech Recognition; CER: Character Error Rate; CMU: Carnegie Mellon University; CNN: Convolutional Neural Network; CNTK: CogNitive ToolKit; CUED: Cambridge University Engineering Department; DCT:Discrete Cosine Transformation; DL: Deep Learning; DNN: Deep Neural Network; DRL: Deep Reinforcement Learning; DWT: Discrete Wavelet Transform; FFT: Fast Fourier Transformation; GMM: Gaussian Mixture Model; HMM: Hidden Markov Model; HTK: Hidden Markov Model ToolKit; JASPER: Just Another Speech Recognizer; LDA: Linear Discriminant Analysis; LER: Letter Error Rate; LGB: Light Gradient Boosting Machine; LM:Language Model; LPC: Linear Predictive Coding; LVCSR: Large Vocabulary Continuous Speech Recognition; LVQ: Learning Vector Quantization Algorithm; MFCC: Mel-Frequency Cepstrum Coefficient; ML: Machine Learning; PCM:Pulse-Code Modulation; PPVT: Peabody Picture Vocabulary Test; RASTA: RelAtive SpecTral; RLAT: Rapid Language Adaptation Toolkit; S2ST: Speech-to-Speech Translation; SAPI: Speech Application Programming Interface; SDK: Software Development Kit; SVASR:Small Vocabulary Automatic Speech Recognition; WER: Word Error Rate
first_indexed	2024-03-11T13:40:39Z
format	Article
id	doaj.art-b2589f32d40d41cf9d5ba47939a773ca
institution	Directory Open Access Journal
issn	0883-9514 1087-6545
language	English
last_indexed	2024-03-11T13:40:39Z
publishDate	2022-12-01
publisher	Taylor & Francis Group
record_format	Article
series	Applied Artificial Intelligence
spelling	doaj.art-b2589f32d40d41cf9d5ba47939a773ca2023-11-02T13:36:38ZengTaylor & Francis GroupApplied Artificial Intelligence0883-95141087-65452022-12-0136110.1080/08839514.2022.20950392095039Automatic Speech Recognition Using Limited Vocabulary: A SurveyJean Louis K. E Fendji0Diane C. M. Tala1Blaise O. Yenke2Marcellin Atemkeng3University Institute of Technology, University of NgaoundereUniversity of NgaoundereUniversity Institute of Technology, University of NgaoundereRhodes UniversityAutomatic Speech Recognition (ASR) is an active field of research due to its large number of applications and the proliferation of interfaces or computing devices that can support speech processing. However, the bulk of applications are based on well-resourced languages that overshadow under-resourced ones. Yet, ASR represents an undeniable means to promote such languages, especially when designing human-to-human or human-to-machine systems involving illiterate people. An approach to design an ASR system targeting under-resourced languages is to start with a limited vocabulary. ASR using a limited vocabulary is a subset of the speech recognition problem that focuses on the recognition of a small number of words or sentences. This paper aims to provide a comprehensive view of mechanisms behind ASR systems as well as techniques, tools, projects, recent contributions, and possible future directions in ASR using a limited vocabulary. This work consequently provides a way forward when designing an ASR system using limited vocabulary. Although an emphasis is put on limited vocabulary, most of the tools and techniques reported in this survey can be applied to ASR systems in general. AbbreviationsACC: Accuracy; AM: Acoustic Model; ASR: Automatic Speech Recognition; BD-4SK-ASR: Basic Dataset for Sorani Kurdish Automatic Speech Recognition; CER: Character Error Rate; CMU: Carnegie Mellon University; CNN: Convolutional Neural Network; CNTK: CogNitive ToolKit; CUED: Cambridge University Engineering Department; DCT:Discrete Cosine Transformation; DL: Deep Learning; DNN: Deep Neural Network; DRL: Deep Reinforcement Learning; DWT: Discrete Wavelet Transform; FFT: Fast Fourier Transformation; GMM: Gaussian Mixture Model; HMM: Hidden Markov Model; HTK: Hidden Markov Model ToolKit; JASPER: Just Another Speech Recognizer; LDA: Linear Discriminant Analysis; LER: Letter Error Rate; LGB: Light Gradient Boosting Machine; LM:Language Model; LPC: Linear Predictive Coding; LVCSR: Large Vocabulary Continuous Speech Recognition; LVQ: Learning Vector Quantization Algorithm; MFCC: Mel-Frequency Cepstrum Coefficient; ML: Machine Learning; PCM:Pulse-Code Modulation; PPVT: Peabody Picture Vocabulary Test; RASTA: RelAtive SpecTral; RLAT: Rapid Language Adaptation Toolkit; S2ST: Speech-to-Speech Translation; SAPI: Speech Application Programming Interface; SDK: Software Development Kit; SVASR:Small Vocabulary Automatic Speech Recognition; WER: Word Error Ratehttp://dx.doi.org/10.1080/08839514.2022.2095039
spellingShingle	Jean Louis K. E Fendji Diane C. M. Tala Blaise O. Yenke Marcellin Atemkeng Automatic Speech Recognition Using Limited Vocabulary: A Survey Applied Artificial Intelligence
title	Automatic Speech Recognition Using Limited Vocabulary: A Survey
title_full	Automatic Speech Recognition Using Limited Vocabulary: A Survey
title_fullStr	Automatic Speech Recognition Using Limited Vocabulary: A Survey
title_full_unstemmed	Automatic Speech Recognition Using Limited Vocabulary: A Survey
title_short	Automatic Speech Recognition Using Limited Vocabulary: A Survey
title_sort	automatic speech recognition using limited vocabulary a survey
url	http://dx.doi.org/10.1080/08839514.2022.2095039
work_keys_str_mv	AT jeanlouiskefendji automaticspeechrecognitionusinglimitedvocabularyasurvey AT dianecmtala automaticspeechrecognitionusinglimitedvocabularyasurvey AT blaiseoyenke automaticspeechrecognitionusinglimitedvocabularyasurvey AT marcellinatemkeng automaticspeechrecognitionusinglimitedvocabularyasurvey

Automatic Speech Recognition Using Limited Vocabulary: A Survey

Similar Items