Feature-based pronunciation modeling for automatic speech recognition

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.

Bibliographic Details
Main Author:	Livescu, Karen, 1975-
Other Authors:	James R. Glass.
Format:	Thesis
Language:	eng
Published:	Massachusetts Institute of Technology 2008
Subjects:	Electrical Engineering and Computer Science.
Online Access:	http://dspace.mit.edu/handle/1721.1/34488 http://hdl.handle.net/1721.1/34488

_version_	1826207932106145792
author	Livescu, Karen, 1975-
author2	James R. Glass.
author_facet	James R. Glass. Livescu, Karen, 1975-
author_sort	Livescu, Karen, 1975-
collection	MIT
description	Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.
first_indexed	2024-09-23T13:57:13Z
format	Thesis
id	mit-1721.1/34488
institution	Massachusetts Institute of Technology
language	eng
last_indexed	2024-09-23T13:57:13Z
publishDate	2008
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/344882019-04-12T09:40:18Z Feature-based pronunciation modeling for automatic speech recognition Livescu, Karen, 1975- James R. Glass. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005. Includes bibliographical references (p. 131-140). Spoken language, especially conversational speech, is characterized by great variability in word pronunciation, including many variants that differ grossly from dictionary prototypes. This is one factor in the poor performance of automatic speech recognizers on conversational speech. One approach to handling this variation consists of expanding the dictionary with phonetic substitution, insertion, and deletion rules. Common rule sets, however, typically leave many pronunciation variants unaccounted for and increase word confusability due to the coarse granularity of phone units. We present an alternative approach, in which many types of variation are explained by representing a pronunciation as multiple streams of linguistic features rather than a single stream of phones. Features may correspond to the positions of the speech articulators, such as the lips and tongue, or to acoustic or perceptual categories. By allowing for asynchrony between features and per-feature substitutions, many pronunciation changes that are difficult to account for with phone-based models become quite natural. Although it is well-known that many phenomena can be attributed to this "semi-independent evolution" of features, previous models of pronunciation variation have typically not taken advantage of this. In particular, we propose a class of feature-based pronunciation models represented as dynamic Bayesian networks (DBNs). (cont.) The DBN framework allows us to naturally represent the factorization of the state space of feature combinations into feature-specific factors, as well as providing standard algorithms for inference and parameter learning. We investigate the behavior of such a model in isolation using manually transcribed words. Compared to a phone-based baseline, the feature-based model has both higher coverage of observed pronunciations and higher recognition rate for isolated words. We also discuss the ways in which such a model can be incorporated into various types of end-to-end speech recognizers and present several examples of implemented systems, for both acoustic speech recognition and lipreading tasks. by Karen Livescu. Ph.D. 2008-03-26T20:36:55Z 2008-03-26T20:36:55Z 2005 2005 Thesis http://dspace.mit.edu/handle/1721.1/34488 http://hdl.handle.net/1721.1/34488 70847032 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/34488 http://dspace.mit.edu/handle/1721.1/7582 140 p. application/pdf Massachusetts Institute of Technology
spellingShingle	Electrical Engineering and Computer Science. Livescu, Karen, 1975- Feature-based pronunciation modeling for automatic speech recognition
title	Feature-based pronunciation modeling for automatic speech recognition
title_full	Feature-based pronunciation modeling for automatic speech recognition
title_fullStr	Feature-based pronunciation modeling for automatic speech recognition
title_full_unstemmed	Feature-based pronunciation modeling for automatic speech recognition
title_short	Feature-based pronunciation modeling for automatic speech recognition
title_sort	feature based pronunciation modeling for automatic speech recognition
topic	Electrical Engineering and Computer Science.
url	http://dspace.mit.edu/handle/1721.1/34488 http://hdl.handle.net/1721.1/34488
work_keys_str_mv	AT livescukaren1975 featurebasedpronunciationmodelingforautomaticspeechrecognition

Feature-based pronunciation modeling for automatic speech recognition

Similar Items