Language Model Adaptation Using Machine-Translated Text for Resource-Deficient Languages

Text corpus size is an important issue when building a language model (LM). This is a particularly important issue for languages where little data is available. This paper introduces an LM adaptation technique to improve an LM built using a small amount of task-dependent text with the help of a mach...

Full description

Bibliographic Details
Main Authors: Sadaoki Furui, Koji Iwano, Arnar Thor Jensson
Format: Article
Language:English
Published: SpringerOpen 2009-01-01
Series:EURASIP Journal on Audio, Speech, and Music Processing
Online Access:http://dx.doi.org/10.1155/2008/573832
_version_ 1819077534640242688
author Sadaoki Furui
Koji Iwano
Arnar Thor Jensson
author_facet Sadaoki Furui
Koji Iwano
Arnar Thor Jensson
author_sort Sadaoki Furui
collection DOAJ
description Text corpus size is an important issue when building a language model (LM). This is a particularly important issue for languages where little data is available. This paper introduces an LM adaptation technique to improve an LM built using a small amount of task-dependent text with the help of a machine-translated text corpus. Icelandic speech recognition experiments were performed using data, machine translated (MT) from English to Icelandic on a word-by-word and sentence-by-sentence basis. LM interpolation using the baseline LM and an LM built from either word-by-word or sentence-by-sentence translated text reduced the word error rate significantly when manually obtained utterances used as a baseline were very sparse.
first_indexed 2024-12-21T18:58:44Z
format Article
id doaj.art-a90d69b049df4cb3b706d2bc2d8b3975
institution Directory Open Access Journal
issn 1687-4714
1687-4722
language English
last_indexed 2024-12-21T18:58:44Z
publishDate 2009-01-01
publisher SpringerOpen
record_format Article
series EURASIP Journal on Audio, Speech, and Music Processing
spelling doaj.art-a90d69b049df4cb3b706d2bc2d8b39752022-12-21T18:53:33ZengSpringerOpenEURASIP Journal on Audio, Speech, and Music Processing1687-47141687-47222009-01-01200810.1155/2008/573832Language Model Adaptation Using Machine-Translated Text for Resource-Deficient LanguagesSadaoki FuruiKoji IwanoArnar Thor JenssonText corpus size is an important issue when building a language model (LM). This is a particularly important issue for languages where little data is available. This paper introduces an LM adaptation technique to improve an LM built using a small amount of task-dependent text with the help of a machine-translated text corpus. Icelandic speech recognition experiments were performed using data, machine translated (MT) from English to Icelandic on a word-by-word and sentence-by-sentence basis. LM interpolation using the baseline LM and an LM built from either word-by-word or sentence-by-sentence translated text reduced the word error rate significantly when manually obtained utterances used as a baseline were very sparse.http://dx.doi.org/10.1155/2008/573832
spellingShingle Sadaoki Furui
Koji Iwano
Arnar Thor Jensson
Language Model Adaptation Using Machine-Translated Text for Resource-Deficient Languages
EURASIP Journal on Audio, Speech, and Music Processing
title Language Model Adaptation Using Machine-Translated Text for Resource-Deficient Languages
title_full Language Model Adaptation Using Machine-Translated Text for Resource-Deficient Languages
title_fullStr Language Model Adaptation Using Machine-Translated Text for Resource-Deficient Languages
title_full_unstemmed Language Model Adaptation Using Machine-Translated Text for Resource-Deficient Languages
title_short Language Model Adaptation Using Machine-Translated Text for Resource-Deficient Languages
title_sort language model adaptation using machine translated text for resource deficient languages
url http://dx.doi.org/10.1155/2008/573832
work_keys_str_mv AT sadaokifurui languagemodeladaptationusingmachinetranslatedtextforresourcedeficientlanguages
AT kojiiwano languagemodeladaptationusingmachinetranslatedtextforresourcedeficientlanguages
AT arnarthorjensson languagemodeladaptationusingmachinetranslatedtextforresourcedeficientlanguages