Language Model Adaptation Using Machine-Translated Text for Resource-Deficient Languages

Text corpus size is an important issue when building a language model (LM). This is a particularly important issue for languages where little data is available. This paper introduces an LM adaptation technique to improve an LM built using a small amount of task-dependent text with the help of a mach...

Full description

Bibliographic Details
Main Authors:	Sadaoki Furui, Koji Iwano, Arnar Thor Jensson
Format:	Article
Language:	English
Published:	SpringerOpen 2009-01-01
Series:	EURASIP Journal on Audio, Speech, and Music Processing
Online Access:	http://dx.doi.org/10.1155/2008/573832

Description
Summary:	Text corpus size is an important issue when building a language model (LM). This is a particularly important issue for languages where little data is available. This paper introduces an LM adaptation technique to improve an LM built using a small amount of task-dependent text with the help of a machine-translated text corpus. Icelandic speech recognition experiments were performed using data, machine translated (MT) from English to Icelandic on a word-by-word and sentence-by-sentence basis. LM interpolation using the baseline LM and an LM built from either word-by-word or sentence-by-sentence translated text reduced the word error rate significantly when manually obtained utterances used as a baseline were very sparse.
ISSN:	1687-4714 1687-4722

Language Model Adaptation Using Machine-Translated Text for Resource-Deficient Languages

Similar Items