Methodology for Training Small Domain-specific Language Models and Its Application in Service Robot Speech Interface

The proposed paper introduces the novel methodology for training small domain-specific language models only from domain vocabulary. Proposed methodology is intended for situations, when no training data are available and preparing of appropriate deterministic grammar is not trivial task. Methodolog...

Full description

Bibliographic Details
Main Authors: ONDAS Stanislav, JUHAR Jozef, HOLCER Roland
Format: Article
Language:English
Published: Editura Universităţii din Oradea 2014-05-01
Series:Journal of Electrical and Electronics Engineering
Subjects:
Online Access:http://electroinf.uoradea.ro/images/articles/CERCETARE/Reviste/JEEE/JEEE_V7_N1_MAY_2014/Ondas_may2014.pdf
Description
Summary:The proposed paper introduces the novel methodology for training small domain-specific language models only from domain vocabulary. Proposed methodology is intended for situations, when no training data are available and preparing of appropriate deterministic grammar is not trivial task. Methodology consists of two phases. In the first phase the “random” deterministic grammar, which enables to generate all possible combination of unigrams and bigrams is constructed from vocabulary. Then, prepared random grammar serves for generating the training corpus. The “random” n-gram model is trained from generated corpus, which can be adapted in second phase. Evaluation of proposed approach has shown usability of the methodology for small domains. Results of methodology assessment favor designed method instead of constructing the appropriate deterministic grammar.
ISSN:1844-6035
1844-6035