Automatic Speech Recognition Using Limited Vocabulary: A Survey

Automatic Speech Recognition (ASR) is an active field of research due to its large number of applications and the proliferation of interfaces or computing devices that can support speech processing. However, the bulk of applications are based on well-resourced languages that overshadow under-resourc...

Full description

Bibliographic Details
Main Authors: Jean Louis K. E Fendji, Diane C. M. Tala, Blaise O. Yenke, Marcellin Atemkeng
Format: Article
Language:English
Published: Taylor & Francis Group 2022-12-01
Series:Applied Artificial Intelligence
Online Access:http://dx.doi.org/10.1080/08839514.2022.2095039
_version_ 1797641099670454272
author Jean Louis K. E Fendji
Diane C. M. Tala
Blaise O. Yenke
Marcellin Atemkeng
author_facet Jean Louis K. E Fendji
Diane C. M. Tala
Blaise O. Yenke
Marcellin Atemkeng
author_sort Jean Louis K. E Fendji
collection DOAJ
description Automatic Speech Recognition (ASR) is an active field of research due to its large number of applications and the proliferation of interfaces or computing devices that can support speech processing. However, the bulk of applications are based on well-resourced languages that overshadow under-resourced ones. Yet, ASR represents an undeniable means to promote such languages, especially when designing human-to-human or human-to-machine systems involving illiterate people. An approach to design an ASR system targeting under-resourced languages is to start with a limited vocabulary. ASR using a limited vocabulary is a subset of the speech recognition problem that focuses on the recognition of a small number of words or sentences. This paper aims to provide a comprehensive view of mechanisms behind ASR systems as well as techniques, tools, projects, recent contributions, and possible future directions in ASR using a limited vocabulary. This work consequently provides a way forward when designing an ASR system using limited vocabulary. Although an emphasis is put on limited vocabulary, most of the tools and techniques reported in this survey can be applied to ASR systems in general. AbbreviationsACC: Accuracy; AM: Acoustic Model; ASR: Automatic Speech Recognition; BD-4SK-ASR: Basic Dataset for Sorani Kurdish Automatic Speech Recognition; CER: Character Error Rate; CMU: Carnegie Mellon University; CNN: Convolutional Neural Network; CNTK: CogNitive ToolKit; CUED: Cambridge University Engineering Department; DCT:Discrete Cosine Transformation; DL: Deep Learning; DNN: Deep Neural Network; DRL: Deep Reinforcement Learning; DWT: Discrete Wavelet Transform; FFT: Fast Fourier Transformation; GMM: Gaussian Mixture Model; HMM: Hidden Markov Model; HTK: Hidden Markov Model ToolKit; JASPER: Just Another Speech Recognizer; LDA: Linear Discriminant Analysis; LER: Letter Error Rate; LGB: Light Gradient Boosting Machine; LM:Language Model; LPC: Linear Predictive Coding; LVCSR: Large Vocabulary Continuous Speech Recognition; LVQ: Learning Vector Quantization Algorithm; MFCC: Mel-Frequency Cepstrum Coefficient; ML: Machine Learning; PCM:Pulse-Code Modulation; PPVT: Peabody Picture Vocabulary Test; RASTA: RelAtive SpecTral; RLAT: Rapid Language Adaptation Toolkit; S2ST: Speech-to-Speech Translation; SAPI: Speech Application Programming Interface; SDK: Software Development Kit; SVASR:Small Vocabulary Automatic Speech Recognition; WER: Word Error Rate
first_indexed 2024-03-11T13:40:39Z
format Article
id doaj.art-b2589f32d40d41cf9d5ba47939a773ca
institution Directory Open Access Journal
issn 0883-9514
1087-6545
language English
last_indexed 2024-03-11T13:40:39Z
publishDate 2022-12-01
publisher Taylor & Francis Group
record_format Article
series Applied Artificial Intelligence
spelling doaj.art-b2589f32d40d41cf9d5ba47939a773ca2023-11-02T13:36:38ZengTaylor & Francis GroupApplied Artificial Intelligence0883-95141087-65452022-12-0136110.1080/08839514.2022.20950392095039Automatic Speech Recognition Using Limited Vocabulary: A SurveyJean Louis K. E Fendji0Diane C. M. Tala1Blaise O. Yenke2Marcellin Atemkeng3University Institute of Technology, University of NgaoundereUniversity of NgaoundereUniversity Institute of Technology, University of NgaoundereRhodes UniversityAutomatic Speech Recognition (ASR) is an active field of research due to its large number of applications and the proliferation of interfaces or computing devices that can support speech processing. However, the bulk of applications are based on well-resourced languages that overshadow under-resourced ones. Yet, ASR represents an undeniable means to promote such languages, especially when designing human-to-human or human-to-machine systems involving illiterate people. An approach to design an ASR system targeting under-resourced languages is to start with a limited vocabulary. ASR using a limited vocabulary is a subset of the speech recognition problem that focuses on the recognition of a small number of words or sentences. This paper aims to provide a comprehensive view of mechanisms behind ASR systems as well as techniques, tools, projects, recent contributions, and possible future directions in ASR using a limited vocabulary. This work consequently provides a way forward when designing an ASR system using limited vocabulary. Although an emphasis is put on limited vocabulary, most of the tools and techniques reported in this survey can be applied to ASR systems in general. AbbreviationsACC: Accuracy; AM: Acoustic Model; ASR: Automatic Speech Recognition; BD-4SK-ASR: Basic Dataset for Sorani Kurdish Automatic Speech Recognition; CER: Character Error Rate; CMU: Carnegie Mellon University; CNN: Convolutional Neural Network; CNTK: CogNitive ToolKit; CUED: Cambridge University Engineering Department; DCT:Discrete Cosine Transformation; DL: Deep Learning; DNN: Deep Neural Network; DRL: Deep Reinforcement Learning; DWT: Discrete Wavelet Transform; FFT: Fast Fourier Transformation; GMM: Gaussian Mixture Model; HMM: Hidden Markov Model; HTK: Hidden Markov Model ToolKit; JASPER: Just Another Speech Recognizer; LDA: Linear Discriminant Analysis; LER: Letter Error Rate; LGB: Light Gradient Boosting Machine; LM:Language Model; LPC: Linear Predictive Coding; LVCSR: Large Vocabulary Continuous Speech Recognition; LVQ: Learning Vector Quantization Algorithm; MFCC: Mel-Frequency Cepstrum Coefficient; ML: Machine Learning; PCM:Pulse-Code Modulation; PPVT: Peabody Picture Vocabulary Test; RASTA: RelAtive SpecTral; RLAT: Rapid Language Adaptation Toolkit; S2ST: Speech-to-Speech Translation; SAPI: Speech Application Programming Interface; SDK: Software Development Kit; SVASR:Small Vocabulary Automatic Speech Recognition; WER: Word Error Ratehttp://dx.doi.org/10.1080/08839514.2022.2095039
spellingShingle Jean Louis K. E Fendji
Diane C. M. Tala
Blaise O. Yenke
Marcellin Atemkeng
Automatic Speech Recognition Using Limited Vocabulary: A Survey
Applied Artificial Intelligence
title Automatic Speech Recognition Using Limited Vocabulary: A Survey
title_full Automatic Speech Recognition Using Limited Vocabulary: A Survey
title_fullStr Automatic Speech Recognition Using Limited Vocabulary: A Survey
title_full_unstemmed Automatic Speech Recognition Using Limited Vocabulary: A Survey
title_short Automatic Speech Recognition Using Limited Vocabulary: A Survey
title_sort automatic speech recognition using limited vocabulary a survey
url http://dx.doi.org/10.1080/08839514.2022.2095039
work_keys_str_mv AT jeanlouiskefendji automaticspeechrecognitionusinglimitedvocabularyasurvey
AT dianecmtala automaticspeechrecognitionusinglimitedvocabularyasurvey
AT blaiseoyenke automaticspeechrecognitionusinglimitedvocabularyasurvey
AT marcellinatemkeng automaticspeechrecognitionusinglimitedvocabularyasurvey