Automatic Speech Recognition Using Limited Vocabulary: A Survey
Automatic Speech Recognition (ASR) is an active field of research due to its large number of applications and the proliferation of interfaces or computing devices that can support speech processing. However, the bulk of applications are based on well-resourced languages that overshadow under-resourc...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Taylor & Francis Group
2022-12-01
|
Series: | Applied Artificial Intelligence |
Online Access: | http://dx.doi.org/10.1080/08839514.2022.2095039 |
_version_ | 1797641099670454272 |
---|---|
author | Jean Louis K. E Fendji Diane C. M. Tala Blaise O. Yenke Marcellin Atemkeng |
author_facet | Jean Louis K. E Fendji Diane C. M. Tala Blaise O. Yenke Marcellin Atemkeng |
author_sort | Jean Louis K. E Fendji |
collection | DOAJ |
description | Automatic Speech Recognition (ASR) is an active field of research due to its large number of applications and the proliferation of interfaces or computing devices that can support speech processing. However, the bulk of applications are based on well-resourced languages that overshadow under-resourced ones. Yet, ASR represents an undeniable means to promote such languages, especially when designing human-to-human or human-to-machine systems involving illiterate people. An approach to design an ASR system targeting under-resourced languages is to start with a limited vocabulary. ASR using a limited vocabulary is a subset of the speech recognition problem that focuses on the recognition of a small number of words or sentences. This paper aims to provide a comprehensive view of mechanisms behind ASR systems as well as techniques, tools, projects, recent contributions, and possible future directions in ASR using a limited vocabulary. This work consequently provides a way forward when designing an ASR system using limited vocabulary. Although an emphasis is put on limited vocabulary, most of the tools and techniques reported in this survey can be applied to ASR systems in general. AbbreviationsACC: Accuracy; AM: Acoustic Model; ASR: Automatic Speech Recognition; BD-4SK-ASR: Basic Dataset for Sorani Kurdish Automatic Speech Recognition; CER: Character Error Rate; CMU: Carnegie Mellon University; CNN: Convolutional Neural Network; CNTK: CogNitive ToolKit; CUED: Cambridge University Engineering Department; DCT:Discrete Cosine Transformation; DL: Deep Learning; DNN: Deep Neural Network; DRL: Deep Reinforcement Learning; DWT: Discrete Wavelet Transform; FFT: Fast Fourier Transformation; GMM: Gaussian Mixture Model; HMM: Hidden Markov Model; HTK: Hidden Markov Model ToolKit; JASPER: Just Another Speech Recognizer; LDA: Linear Discriminant Analysis; LER: Letter Error Rate; LGB: Light Gradient Boosting Machine; LM:Language Model; LPC: Linear Predictive Coding; LVCSR: Large Vocabulary Continuous Speech Recognition; LVQ: Learning Vector Quantization Algorithm; MFCC: Mel-Frequency Cepstrum Coefficient; ML: Machine Learning; PCM:Pulse-Code Modulation; PPVT: Peabody Picture Vocabulary Test; RASTA: RelAtive SpecTral; RLAT: Rapid Language Adaptation Toolkit; S2ST: Speech-to-Speech Translation; SAPI: Speech Application Programming Interface; SDK: Software Development Kit; SVASR:Small Vocabulary Automatic Speech Recognition; WER: Word Error Rate |
first_indexed | 2024-03-11T13:40:39Z |
format | Article |
id | doaj.art-b2589f32d40d41cf9d5ba47939a773ca |
institution | Directory Open Access Journal |
issn | 0883-9514 1087-6545 |
language | English |
last_indexed | 2024-03-11T13:40:39Z |
publishDate | 2022-12-01 |
publisher | Taylor & Francis Group |
record_format | Article |
series | Applied Artificial Intelligence |
spelling | doaj.art-b2589f32d40d41cf9d5ba47939a773ca2023-11-02T13:36:38ZengTaylor & Francis GroupApplied Artificial Intelligence0883-95141087-65452022-12-0136110.1080/08839514.2022.20950392095039Automatic Speech Recognition Using Limited Vocabulary: A SurveyJean Louis K. E Fendji0Diane C. M. Tala1Blaise O. Yenke2Marcellin Atemkeng3University Institute of Technology, University of NgaoundereUniversity of NgaoundereUniversity Institute of Technology, University of NgaoundereRhodes UniversityAutomatic Speech Recognition (ASR) is an active field of research due to its large number of applications and the proliferation of interfaces or computing devices that can support speech processing. However, the bulk of applications are based on well-resourced languages that overshadow under-resourced ones. Yet, ASR represents an undeniable means to promote such languages, especially when designing human-to-human or human-to-machine systems involving illiterate people. An approach to design an ASR system targeting under-resourced languages is to start with a limited vocabulary. ASR using a limited vocabulary is a subset of the speech recognition problem that focuses on the recognition of a small number of words or sentences. This paper aims to provide a comprehensive view of mechanisms behind ASR systems as well as techniques, tools, projects, recent contributions, and possible future directions in ASR using a limited vocabulary. This work consequently provides a way forward when designing an ASR system using limited vocabulary. Although an emphasis is put on limited vocabulary, most of the tools and techniques reported in this survey can be applied to ASR systems in general. AbbreviationsACC: Accuracy; AM: Acoustic Model; ASR: Automatic Speech Recognition; BD-4SK-ASR: Basic Dataset for Sorani Kurdish Automatic Speech Recognition; CER: Character Error Rate; CMU: Carnegie Mellon University; CNN: Convolutional Neural Network; CNTK: CogNitive ToolKit; CUED: Cambridge University Engineering Department; DCT:Discrete Cosine Transformation; DL: Deep Learning; DNN: Deep Neural Network; DRL: Deep Reinforcement Learning; DWT: Discrete Wavelet Transform; FFT: Fast Fourier Transformation; GMM: Gaussian Mixture Model; HMM: Hidden Markov Model; HTK: Hidden Markov Model ToolKit; JASPER: Just Another Speech Recognizer; LDA: Linear Discriminant Analysis; LER: Letter Error Rate; LGB: Light Gradient Boosting Machine; LM:Language Model; LPC: Linear Predictive Coding; LVCSR: Large Vocabulary Continuous Speech Recognition; LVQ: Learning Vector Quantization Algorithm; MFCC: Mel-Frequency Cepstrum Coefficient; ML: Machine Learning; PCM:Pulse-Code Modulation; PPVT: Peabody Picture Vocabulary Test; RASTA: RelAtive SpecTral; RLAT: Rapid Language Adaptation Toolkit; S2ST: Speech-to-Speech Translation; SAPI: Speech Application Programming Interface; SDK: Software Development Kit; SVASR:Small Vocabulary Automatic Speech Recognition; WER: Word Error Ratehttp://dx.doi.org/10.1080/08839514.2022.2095039 |
spellingShingle | Jean Louis K. E Fendji Diane C. M. Tala Blaise O. Yenke Marcellin Atemkeng Automatic Speech Recognition Using Limited Vocabulary: A Survey Applied Artificial Intelligence |
title | Automatic Speech Recognition Using Limited Vocabulary: A Survey |
title_full | Automatic Speech Recognition Using Limited Vocabulary: A Survey |
title_fullStr | Automatic Speech Recognition Using Limited Vocabulary: A Survey |
title_full_unstemmed | Automatic Speech Recognition Using Limited Vocabulary: A Survey |
title_short | Automatic Speech Recognition Using Limited Vocabulary: A Survey |
title_sort | automatic speech recognition using limited vocabulary a survey |
url | http://dx.doi.org/10.1080/08839514.2022.2095039 |
work_keys_str_mv | AT jeanlouiskefendji automaticspeechrecognitionusinglimitedvocabularyasurvey AT dianecmtala automaticspeechrecognitionusinglimitedvocabularyasurvey AT blaiseoyenke automaticspeechrecognitionusinglimitedvocabularyasurvey AT marcellinatemkeng automaticspeechrecognitionusinglimitedvocabularyasurvey |