Analysis of Morph-Based Language Modeling and Speech Recognition in Slovak
The inflection of the Slovak language causes a large number of unique word forms, which produces not only a large vocabulary, but also a number of out-of-vocabulary words. Morph-based language models solve this problem by decomposition of inflected word forms into small sub-word units and resolve th...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
VSB-Technical University of Ostrava
2012-01-01
|
Series: | Advances in Electrical and Electronic Engineering |
Subjects: | |
Online Access: | http://advances.utc.sk/index.php/AEEE/article/view/717 |
_version_ | 1797827063064821760 |
---|---|
author | Jan Stas Daniel Hladek Jozef Juhar Daniel Zlacky |
author_facet | Jan Stas Daniel Hladek Jozef Juhar Daniel Zlacky |
author_sort | Jan Stas |
collection | DOAJ |
description | The inflection of the Slovak language causes a large number of unique word forms, which produces not only a large vocabulary, but also a number of out-of-vocabulary words. Morph-based language models solve this problem by decomposition of inflected word forms into small sub-word units and resolve the general problem of sparsity the training data. In this paper, we present several rule-based and data-driven approaches to the automatic segmentation of words into morphs. These data are later used in the modeling of the Slovak language for large vocabulary continuous speech recognition. Preliminary results show a significant decrease in the number of out-of-vocabulary words and reduction of resultant language model perplexity. |
first_indexed | 2024-04-09T12:42:15Z |
format | Article |
id | doaj.art-c5b5500704fa48d6830c6040cc1a466b |
institution | Directory Open Access Journal |
issn | 1336-1376 1804-3119 |
language | English |
last_indexed | 2024-04-09T12:42:15Z |
publishDate | 2012-01-01 |
publisher | VSB-Technical University of Ostrava |
record_format | Article |
series | Advances in Electrical and Electronic Engineering |
spelling | doaj.art-c5b5500704fa48d6830c6040cc1a466b2023-05-14T20:50:08ZengVSB-Technical University of OstravaAdvances in Electrical and Electronic Engineering1336-13761804-31192012-01-0110429129610.15598/aeee.v10i4.717557Analysis of Morph-Based Language Modeling and Speech Recognition in SlovakJan Stas0Daniel Hladek1Jozef Juhar2Daniel Zlacky3Department of Electronics and Multimedia Communications Faculty of Electrical Engineering and Informatics Technical University of Kosice KosiceDepartment of Electronics and Multimedia Communications Faculty of Electrical Engineering and Informatics Technical University of Kosice KosiceDepartment of Electronics and Multimedia Communications Faculty of Electrical Engineering and Informatics Technical University of Kosice KosiceDepartment of Electronics and Multimedia Communications Faculty of Electrical Engineering and Informatics Technical University of Kosice KosiceThe inflection of the Slovak language causes a large number of unique word forms, which produces not only a large vocabulary, but also a number of out-of-vocabulary words. Morph-based language models solve this problem by decomposition of inflected word forms into small sub-word units and resolve the general problem of sparsity the training data. In this paper, we present several rule-based and data-driven approaches to the automatic segmentation of words into morphs. These data are later used in the modeling of the Slovak language for large vocabulary continuous speech recognition. Preliminary results show a significant decrease in the number of out-of-vocabulary words and reduction of resultant language model perplexity.http://advances.utc.sk/index.php/AEEE/article/view/717automatic word segmentationlanguage modelingmorphological analysisspeech recognition. |
spellingShingle | Jan Stas Daniel Hladek Jozef Juhar Daniel Zlacky Analysis of Morph-Based Language Modeling and Speech Recognition in Slovak Advances in Electrical and Electronic Engineering automatic word segmentation language modeling morphological analysis speech recognition. |
title | Analysis of Morph-Based Language Modeling and Speech Recognition in Slovak |
title_full | Analysis of Morph-Based Language Modeling and Speech Recognition in Slovak |
title_fullStr | Analysis of Morph-Based Language Modeling and Speech Recognition in Slovak |
title_full_unstemmed | Analysis of Morph-Based Language Modeling and Speech Recognition in Slovak |
title_short | Analysis of Morph-Based Language Modeling and Speech Recognition in Slovak |
title_sort | analysis of morph based language modeling and speech recognition in slovak |
topic | automatic word segmentation language modeling morphological analysis speech recognition. |
url | http://advances.utc.sk/index.php/AEEE/article/view/717 |
work_keys_str_mv | AT janstas analysisofmorphbasedlanguagemodelingandspeechrecognitioninslovak AT danielhladek analysisofmorphbasedlanguagemodelingandspeechrecognitioninslovak AT jozefjuhar analysisofmorphbasedlanguagemodelingandspeechrecognitioninslovak AT danielzlacky analysisofmorphbasedlanguagemodelingandspeechrecognitioninslovak |