Towards multi-domain speech understanding with flexible and dynamic vocabulary

Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2001.

Bibliographic Details
Main Author:	Chung, Grace Yuet-Chee
Other Authors:	Stephanie Seneff.
Format:	Thesis
Language:	eng
Published:	Massachusetts Institute of Technology 2005
Subjects:	Electrical Engineering and Computer Science.
Online Access:	http://hdl.handle.net/1721.1/8925

_version_	1826213799039860736
author	Chung, Grace Yuet-Chee
author2	Stephanie Seneff.
author_facet	Stephanie Seneff. Chung, Grace Yuet-Chee
author_sort	Chung, Grace Yuet-Chee
collection	MIT
description	Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2001.
first_indexed	2024-09-23T15:55:00Z
format	Thesis
id	mit-1721.1/8925
institution	Massachusetts Institute of Technology
language	eng
last_indexed	2024-09-23T15:55:00Z
publishDate	2005
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/89252019-04-11T07:13:39Z Towards multi-domain speech understanding with flexible and dynamic vocabulary Chung, Grace Yuet-Chee Stephanie Seneff. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2001. Includes bibliographical references (p. 201-208). In developing telephone-based conversational systems, we foresee future systems capable of supporting multiple domains and flexible vocabulary. Users can pursue several topics of interest within a single telephone call, and the system is able to switch transparently among domains within a single dialog. This system is able to detect the presence of any out-of-vocabulary (OOV) words, and automatically hypothesizes each of their pronunciation, spelling and meaning. These can be confirmed with the user and the new words are subsequently incorporated into the recognizer lexicon for future use. This thesis will describe our work towards realizing such a vision, using a multi-stage architecture. Our work is focused on organizing the application of linguistic constraints in order to accommodate multiple domain topics and dynamic vocabulary at the spoken input. The philosophy is to exclusively apply below word-level linguistic knowledge at the initial stage. Such knowledge is domain-independent and general to all of the English language. Hence, this is broad enough to support any unknown words that may appear at the input, as well as input from several topic domains. At the same time, the initial pass narrows the search space for the next stage, where domain-specific knowledge that resides at the word-level or above is applied. In the second stage, we envision several parallel recognizers, each with higher order language models tailored specifically to its domain. A final decision algorithm selects a final hypothesis from the set of parallel recognizers. (cont.) Part of our contribution is the development of a novel first stage which attempts to maximize linguistic constraints, using only below word-level information. The goals are to prevent sequences of unknown words from being pruned away prematurely while maintaining performance on in-vocabulary items, as well as reducing the search space for later stages. Our solution coordinates the application of various subword level knowledge sources. The recognizer lexicon is implemented with an inventory of linguistically motivated units called morphs, which are syllables augmented with spelling and word position. This first stage is designed to output a phonetic network so that we are not committed to the initial hypotheses. This adds robustness, as later stages can propose words directly from phones. To maximize performance on the first stage, much of our focus has centered on the integration of a set of hierarchical sublexical models into this first pass. To do this, we utilize the ANGIE framework which supports a trainable context-free grammar, and is designed to acquire subword-level and phonological information statistically. Its models can generalize knowledge about word structure, learned from in-vocabulary data, to previously unseen words. We explore methods for collapsing the ANGIE models into a finite-state transducer (FST) representation which enables these complex models to be efficiently integrated into recognition. The ANGIE-FST needs to encapsulate the hierarchical knowledge of ANGIE and replicate ANGIE's ability to support previously unobserved phonetic sequences ... by Grace Chung. Ph.D. 2005-08-23T16:24:40Z 2005-08-23T16:24:40Z 2001 2001 Thesis http://hdl.handle.net/1721.1/8925 48971905 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 208 p. 17634800 bytes 17634557 bytes application/pdf application/pdf application/pdf Massachusetts Institute of Technology
spellingShingle	Electrical Engineering and Computer Science. Chung, Grace Yuet-Chee Towards multi-domain speech understanding with flexible and dynamic vocabulary
title	Towards multi-domain speech understanding with flexible and dynamic vocabulary
title_full	Towards multi-domain speech understanding with flexible and dynamic vocabulary
title_fullStr	Towards multi-domain speech understanding with flexible and dynamic vocabulary
title_full_unstemmed	Towards multi-domain speech understanding with flexible and dynamic vocabulary
title_short	Towards multi-domain speech understanding with flexible and dynamic vocabulary
title_sort	towards multi domain speech understanding with flexible and dynamic vocabulary
topic	Electrical Engineering and Computer Science.
url	http://hdl.handle.net/1721.1/8925
work_keys_str_mv	AT chunggraceyuetchee towardsmultidomainspeechunderstandingwithflexibleanddynamicvocabulary

Towards multi-domain speech understanding with flexible and dynamic vocabulary

Similar Items