Effect of Time-domain Windowing on Isolated Speech Recognition System Performance
Speech recognition system extract the textual data from the speech signal. The research in speech recognition domain is challenging due to the large variabilities involved with the speech signal. Variety of signal processing and machine learning techniques have been explored to achieve better recogn...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Polish Academy of Sciences
2022-03-01
|
Series: | International Journal of Electronics and Telecommunications |
Subjects: | |
Online Access: | https://journals.pan.pl/Content/122817/PDF/23-3303_Thalengala_L.pdf |
_version_ | 1818489279880364032 |
---|---|
author | Ananthakrishna Thalengala H. Anitha T. Girisha |
author_facet | Ananthakrishna Thalengala H. Anitha T. Girisha |
author_sort | Ananthakrishna Thalengala |
collection | DOAJ |
description | Speech recognition system extract the textual data from the speech signal. The research in speech recognition domain is challenging due to the large variabilities involved with the speech signal. Variety of signal processing and machine learning techniques have been explored to achieve better recognition accuracy. Speech is highly non-stationary in nature and therefore analysis is carried out by considering short time-domain window or frame. In the speech recognition task, cepstral (Mel frequency cepstral coefficients (MFCC)) features are commonly used and are extracted for short time-frame. The effectiveness of features depend upon duration of the time-window chosen. The present study is aimed at investigation of optimal time-window duration for extraction of cepstral features in the context of speech recognition task. A speaker independent speech recognition system for the Kannada language has been considered for the analysis. In the current work, speech utterances of Kannada news corpus recorded from different speakers have been used to create speech database. The hidden Markov tool kit (HTK) has been used to implement the speech recognition system. The MFCC along with their first and second derivative coefficients are considered as feature vectors. Pronunciation dictionary required for the study has been built manually for mono-phone system. Experiments have been carried out and results have been analyzed for different time-window lengths. The overlapping Hamming window has been considered in this study. The best average word recognition accuracy of 61.58% has been obtained for a window length of 110 msec duration. This recognition accuracy is comparable with the similar work found in literature. The experiments have shown that best word recognition performance can be achieved by tuning the window length to its optimum value. |
first_indexed | 2024-12-10T17:02:00Z |
format | Article |
id | doaj.art-e20b868bf8644a0fafb77bfa3b02d6d2 |
institution | Directory Open Access Journal |
issn | 2081-8491 2300-1933 |
language | English |
last_indexed | 2024-12-10T17:02:00Z |
publishDate | 2022-03-01 |
publisher | Polish Academy of Sciences |
record_format | Article |
series | International Journal of Electronics and Telecommunications |
spelling | doaj.art-e20b868bf8644a0fafb77bfa3b02d6d22022-12-22T01:40:33ZengPolish Academy of SciencesInternational Journal of Electronics and Telecommunications2081-84912300-19332022-03-01vol. 68No 1161166https://doi.org/10.24425/ijet.2022.139856Effect of Time-domain Windowing on Isolated Speech Recognition System PerformanceAnanthakrishna Thalengala0H. Anitha1T. Girisha2Department of Electronics and Communication Engineering, Manipal Institute of Technology (MIT), Manipal Academy of Higher Education (MAHE), Manipal, Karnataka State, IndiaDepartment of Electronics and Communication Engineering, Manipal Institute of Technology (MIT), Manipal Academy of Higher Education (MAHE), Manipal, Karnataka State, IndiaDepartment of Electronics and Communication Engineering, Manipal Institute of Technology (MIT), Manipal Academy of Higher Education (MAHE), Manipal, Karnataka State, IndiaSpeech recognition system extract the textual data from the speech signal. The research in speech recognition domain is challenging due to the large variabilities involved with the speech signal. Variety of signal processing and machine learning techniques have been explored to achieve better recognition accuracy. Speech is highly non-stationary in nature and therefore analysis is carried out by considering short time-domain window or frame. In the speech recognition task, cepstral (Mel frequency cepstral coefficients (MFCC)) features are commonly used and are extracted for short time-frame. The effectiveness of features depend upon duration of the time-window chosen. The present study is aimed at investigation of optimal time-window duration for extraction of cepstral features in the context of speech recognition task. A speaker independent speech recognition system for the Kannada language has been considered for the analysis. In the current work, speech utterances of Kannada news corpus recorded from different speakers have been used to create speech database. The hidden Markov tool kit (HTK) has been used to implement the speech recognition system. The MFCC along with their first and second derivative coefficients are considered as feature vectors. Pronunciation dictionary required for the study has been built manually for mono-phone system. Experiments have been carried out and results have been analyzed for different time-window lengths. The overlapping Hamming window has been considered in this study. The best average word recognition accuracy of 61.58% has been obtained for a window length of 110 msec duration. This recognition accuracy is comparable with the similar work found in literature. The experiments have shown that best word recognition performance can be achieved by tuning the window length to its optimum value.https://journals.pan.pl/Content/122817/PDF/23-3303_Thalengala_L.pdfhidden markov model (hmm)isolated speech recognition (isr) systemkannada languagemono-phone modelmel frequency cepstral coefficients (mfcc) |
spellingShingle | Ananthakrishna Thalengala H. Anitha T. Girisha Effect of Time-domain Windowing on Isolated Speech Recognition System Performance International Journal of Electronics and Telecommunications hidden markov model (hmm) isolated speech recognition (isr) system kannada language mono-phone model mel frequency cepstral coefficients (mfcc) |
title | Effect of Time-domain Windowing on Isolated Speech Recognition System Performance |
title_full | Effect of Time-domain Windowing on Isolated Speech Recognition System Performance |
title_fullStr | Effect of Time-domain Windowing on Isolated Speech Recognition System Performance |
title_full_unstemmed | Effect of Time-domain Windowing on Isolated Speech Recognition System Performance |
title_short | Effect of Time-domain Windowing on Isolated Speech Recognition System Performance |
title_sort | effect of time domain windowing on isolated speech recognition system performance |
topic | hidden markov model (hmm) isolated speech recognition (isr) system kannada language mono-phone model mel frequency cepstral coefficients (mfcc) |
url | https://journals.pan.pl/Content/122817/PDF/23-3303_Thalengala_L.pdf |
work_keys_str_mv | AT ananthakrishnathalengala effectoftimedomainwindowingonisolatedspeechrecognitionsystemperformance AT hanitha effectoftimedomainwindowingonisolatedspeechrecognitionsystemperformance AT tgirisha effectoftimedomainwindowingonisolatedspeechrecognitionsystemperformance |