Language Model Adaptation Using Machine-Translated Text for Resource-Deficient Languages
Text corpus size is an important issue when building a language model (LM). This is a particularly important issue for languages where little data is available. This paper introduces an LM adaptation technique to improve an LM built using a small amount of task-dependent text with the help of a mach...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2009-01-01
|
Series: | EURASIP Journal on Audio, Speech, and Music Processing |
Online Access: | http://dx.doi.org/10.1155/2008/573832 |
_version_ | 1819077534640242688 |
---|---|
author | Sadaoki Furui Koji Iwano Arnar Thor Jensson |
author_facet | Sadaoki Furui Koji Iwano Arnar Thor Jensson |
author_sort | Sadaoki Furui |
collection | DOAJ |
description | Text corpus size is an important issue when building a language model (LM). This is a particularly important issue for languages where little data is available. This paper introduces an LM adaptation technique to improve an LM built using a small amount of task-dependent text with the help of a machine-translated text corpus. Icelandic speech recognition experiments were performed using data, machine translated (MT) from English to Icelandic on a word-by-word and sentence-by-sentence basis. LM interpolation using the baseline LM and an LM built from either word-by-word or sentence-by-sentence translated text reduced the word error rate significantly when manually obtained utterances used as a baseline were very sparse. |
first_indexed | 2024-12-21T18:58:44Z |
format | Article |
id | doaj.art-a90d69b049df4cb3b706d2bc2d8b3975 |
institution | Directory Open Access Journal |
issn | 1687-4714 1687-4722 |
language | English |
last_indexed | 2024-12-21T18:58:44Z |
publishDate | 2009-01-01 |
publisher | SpringerOpen |
record_format | Article |
series | EURASIP Journal on Audio, Speech, and Music Processing |
spelling | doaj.art-a90d69b049df4cb3b706d2bc2d8b39752022-12-21T18:53:33ZengSpringerOpenEURASIP Journal on Audio, Speech, and Music Processing1687-47141687-47222009-01-01200810.1155/2008/573832Language Model Adaptation Using Machine-Translated Text for Resource-Deficient LanguagesSadaoki FuruiKoji IwanoArnar Thor JenssonText corpus size is an important issue when building a language model (LM). This is a particularly important issue for languages where little data is available. This paper introduces an LM adaptation technique to improve an LM built using a small amount of task-dependent text with the help of a machine-translated text corpus. Icelandic speech recognition experiments were performed using data, machine translated (MT) from English to Icelandic on a word-by-word and sentence-by-sentence basis. LM interpolation using the baseline LM and an LM built from either word-by-word or sentence-by-sentence translated text reduced the word error rate significantly when manually obtained utterances used as a baseline were very sparse.http://dx.doi.org/10.1155/2008/573832 |
spellingShingle | Sadaoki Furui Koji Iwano Arnar Thor Jensson Language Model Adaptation Using Machine-Translated Text for Resource-Deficient Languages EURASIP Journal on Audio, Speech, and Music Processing |
title | Language Model Adaptation Using Machine-Translated Text for Resource-Deficient Languages |
title_full | Language Model Adaptation Using Machine-Translated Text for Resource-Deficient Languages |
title_fullStr | Language Model Adaptation Using Machine-Translated Text for Resource-Deficient Languages |
title_full_unstemmed | Language Model Adaptation Using Machine-Translated Text for Resource-Deficient Languages |
title_short | Language Model Adaptation Using Machine-Translated Text for Resource-Deficient Languages |
title_sort | language model adaptation using machine translated text for resource deficient languages |
url | http://dx.doi.org/10.1155/2008/573832 |
work_keys_str_mv | AT sadaokifurui languagemodeladaptationusingmachinetranslatedtextforresourcedeficientlanguages AT kojiiwano languagemodeladaptationusingmachinetranslatedtextforresourcedeficientlanguages AT arnarthorjensson languagemodeladaptationusingmachinetranslatedtextforresourcedeficientlanguages |