Analysis of Morph-Based Language Modeling and Speech Recognition in Slovak

The inflection of the Slovak language causes a large number of unique word forms, which produces not only a large vocabulary, but also a number of out-of-vocabulary words. Morph-based language models solve this problem by decomposition of inflected word forms into small sub-word units and resolve th...

Full description

Bibliographic Details
Main Authors: Jan Stas, Daniel Hladek, Jozef Juhar, Daniel Zlacky
Format: Article
Language:English
Published: VSB-Technical University of Ostrava 2012-01-01
Series:Advances in Electrical and Electronic Engineering
Subjects:
Online Access:http://advances.utc.sk/index.php/AEEE/article/view/717
_version_ 1797827063064821760
author Jan Stas
Daniel Hladek
Jozef Juhar
Daniel Zlacky
author_facet Jan Stas
Daniel Hladek
Jozef Juhar
Daniel Zlacky
author_sort Jan Stas
collection DOAJ
description The inflection of the Slovak language causes a large number of unique word forms, which produces not only a large vocabulary, but also a number of out-of-vocabulary words. Morph-based language models solve this problem by decomposition of inflected word forms into small sub-word units and resolve the general problem of sparsity the training data. In this paper, we present several rule-based and data-driven approaches to the automatic segmentation of words into morphs. These data are later used in the modeling of the Slovak language for large vocabulary continuous speech recognition. Preliminary results show a significant decrease in the number of out-of-vocabulary words and reduction of resultant language model perplexity.
first_indexed 2024-04-09T12:42:15Z
format Article
id doaj.art-c5b5500704fa48d6830c6040cc1a466b
institution Directory Open Access Journal
issn 1336-1376
1804-3119
language English
last_indexed 2024-04-09T12:42:15Z
publishDate 2012-01-01
publisher VSB-Technical University of Ostrava
record_format Article
series Advances in Electrical and Electronic Engineering
spelling doaj.art-c5b5500704fa48d6830c6040cc1a466b2023-05-14T20:50:08ZengVSB-Technical University of OstravaAdvances in Electrical and Electronic Engineering1336-13761804-31192012-01-0110429129610.15598/aeee.v10i4.717557Analysis of Morph-Based Language Modeling and Speech Recognition in SlovakJan Stas0Daniel Hladek1Jozef Juhar2Daniel Zlacky3Department of Electronics and Multimedia Communications Faculty of Electrical Engineering and Informatics Technical University of Kosice KosiceDepartment of Electronics and Multimedia Communications Faculty of Electrical Engineering and Informatics Technical University of Kosice KosiceDepartment of Electronics and Multimedia Communications Faculty of Electrical Engineering and Informatics Technical University of Kosice KosiceDepartment of Electronics and Multimedia Communications Faculty of Electrical Engineering and Informatics Technical University of Kosice KosiceThe inflection of the Slovak language causes a large number of unique word forms, which produces not only a large vocabulary, but also a number of out-of-vocabulary words. Morph-based language models solve this problem by decomposition of inflected word forms into small sub-word units and resolve the general problem of sparsity the training data. In this paper, we present several rule-based and data-driven approaches to the automatic segmentation of words into morphs. These data are later used in the modeling of the Slovak language for large vocabulary continuous speech recognition. Preliminary results show a significant decrease in the number of out-of-vocabulary words and reduction of resultant language model perplexity.http://advances.utc.sk/index.php/AEEE/article/view/717automatic word segmentationlanguage modelingmorphological analysisspeech recognition.
spellingShingle Jan Stas
Daniel Hladek
Jozef Juhar
Daniel Zlacky
Analysis of Morph-Based Language Modeling and Speech Recognition in Slovak
Advances in Electrical and Electronic Engineering
automatic word segmentation
language modeling
morphological analysis
speech recognition.
title Analysis of Morph-Based Language Modeling and Speech Recognition in Slovak
title_full Analysis of Morph-Based Language Modeling and Speech Recognition in Slovak
title_fullStr Analysis of Morph-Based Language Modeling and Speech Recognition in Slovak
title_full_unstemmed Analysis of Morph-Based Language Modeling and Speech Recognition in Slovak
title_short Analysis of Morph-Based Language Modeling and Speech Recognition in Slovak
title_sort analysis of morph based language modeling and speech recognition in slovak
topic automatic word segmentation
language modeling
morphological analysis
speech recognition.
url http://advances.utc.sk/index.php/AEEE/article/view/717
work_keys_str_mv AT janstas analysisofmorphbasedlanguagemodelingandspeechrecognitioninslovak
AT danielhladek analysisofmorphbasedlanguagemodelingandspeechrecognitioninslovak
AT jozefjuhar analysisofmorphbasedlanguagemodelingandspeechrecognitioninslovak
AT danielzlacky analysisofmorphbasedlanguagemodelingandspeechrecognitioninslovak