Language model parameter estimation using user transcriptions

In limited data domains, many effective language modeling techniques construct models with parameters to be estimated on an in-domain development set. However, in some domains, no such data exist beyond the unlabeled test corpus. In this work, we explore the iterative use of the recognition hypothes...

Mô tả đầy đủ

Chi tiết về thư mục
Những tác giả chính: Hsu, Bo-June, Glass, James R.
Tác giả khác: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Định dạng: Bài viết
Ngôn ngữ:en_US
Được phát hành: Institute of Electrical and Electronics Engineers 2010
Những chủ đề:
Truy cập trực tuyến:http://hdl.handle.net/1721.1/58944
https://orcid.org/0000-0002-3097-360X
_version_ 1826206616257560576
author Hsu, Bo-June
Glass, James R.
author2 Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Hsu, Bo-June
Glass, James R.
author_sort Hsu, Bo-June
collection MIT
description In limited data domains, many effective language modeling techniques construct models with parameters to be estimated on an in-domain development set. However, in some domains, no such data exist beyond the unlabeled test corpus. In this work, we explore the iterative use of the recognition hypotheses for unsupervised parameter estimation. We also evaluate the effectiveness of supervised adaptation using varying amounts of user-provided transcripts of utterances selected via multiple strategies. While unsupervised adaptation obtains 80% of the potential error reductions, it is outperformed by using only 300 words of user transcription. By transcribing the lowest confidence utterances first, we further obtain an effective word error rate reduction of 0.6%.
first_indexed 2024-09-23T13:35:45Z
format Article
id mit-1721.1/58944
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T13:35:45Z
publishDate 2010
publisher Institute of Electrical and Electronics Engineers
record_format dspace
spelling mit-1721.1/589442022-10-01T15:54:45Z Language model parameter estimation using user transcriptions Hsu, Bo-June Glass, James R. Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Glass, James R. Hsu, Bo-June Glass, James R. adaptation language modeling speech recognition In limited data domains, many effective language modeling techniques construct models with parameters to be estimated on an in-domain development set. However, in some domains, no such data exist beyond the unlabeled test corpus. In this work, we explore the iterative use of the recognition hypotheses for unsupervised parameter estimation. We also evaluate the effectiveness of supervised adaptation using varying amounts of user-provided transcripts of utterances selected via multiple strategies. While unsupervised adaptation obtains 80% of the potential error reductions, it is outperformed by using only 300 words of user transcription. By transcribing the lowest confidence utterances first, we further obtain an effective word error rate reduction of 0.6%. T-Party Project 2010-10-07T16:43:50Z 2010-10-07T16:43:50Z 2009-05 Article http://purl.org/eprint/type/JournalArticle 978-1-4244-2353-8 1520-6149 INSPEC Accession Number: 10701485 http://hdl.handle.net/1721.1/58944 Bo-June Hsu, and J. Glass. “Language model parameter estimation using user transcriptions.” Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on. 2009. 4805-4808. © 2009 IEEE https://orcid.org/0000-0002-3097-360X en_US http://dx.doi.org/10.1109/ICASSP.2009.4960706 Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2009 Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. application/pdf Institute of Electrical and Electronics Engineers IEEE
spellingShingle adaptation
language modeling
speech recognition
Hsu, Bo-June
Glass, James R.
Language model parameter estimation using user transcriptions
title Language model parameter estimation using user transcriptions
title_full Language model parameter estimation using user transcriptions
title_fullStr Language model parameter estimation using user transcriptions
title_full_unstemmed Language model parameter estimation using user transcriptions
title_short Language model parameter estimation using user transcriptions
title_sort language model parameter estimation using user transcriptions
topic adaptation
language modeling
speech recognition
url http://hdl.handle.net/1721.1/58944
https://orcid.org/0000-0002-3097-360X
work_keys_str_mv AT hsubojune languagemodelparameterestimationusingusertranscriptions
AT glassjamesr languagemodelparameterestimationusingusertranscriptions