Characterizing and recognizing spoken corrections in human-computer dialog
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Language: | eng |
Published: |
Massachusetts Institute of Technology
2009
|
Subjects: | |
Online Access: | http://hdl.handle.net/1721.1/47705 |
_version_ | 1826199513972342784 |
---|---|
author | Levow, Gina-Anne |
author2 | Robert C. Berwick. |
author_facet | Robert C. Berwick. Levow, Gina-Anne |
author_sort | Levow, Gina-Anne |
collection | MIT |
description | Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998. |
first_indexed | 2024-09-23T11:21:29Z |
format | Thesis |
id | mit-1721.1/47705 |
institution | Massachusetts Institute of Technology |
language | eng |
last_indexed | 2024-09-23T11:21:29Z |
publishDate | 2009 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/477052020-07-14T22:12:52Z Characterizing and recognizing spoken corrections in human-computer dialog Levow, Gina-Anne Robert C. Berwick. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Electrical Engineering and Computer Science Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998. Includes bibliographical references (p. 103-106). Miscommunication in human-computer spoken language systems is unavoidable. Recognition failures on the part of the system necessitate frequent correction attempts by the user. Unfortunately and counterintuitively, users' attempts to speak more clearly in the face of recognition errors actually lead to decreased recognition accuracy. The difficulty of correcting these errors, in turn, leads to user frustration and poor assessments of system quality. Most current approaches to identifying corrections rely on detecting violations of task or belief models that are ineffective where such constraints are weak and recognition results inaccurate or unavailable. In contrast, the approach pursued in this thesis, in contrast, uses the acoustic contrasts between original inputs and repeat corrections to identify corrections in a more content- and context-independent fashion. This thesis quantifies and builds upon the observation that suprasegmental features, such as duration, pause, and pitch, play a crucial role in distinguishing corrections from other forms of input to spoken language systems. These features can also be used to identify spoken corrections and explain reductions in recognition accuracy for these utterances. By providing a detailed characterization of acoustic-prosodic changes in corrections relative to original inputs in a voice-only system, this thesis contributes to natural language processing and spoken language understanding. We present a treatment of systematic acoustic variability in speech recognizer input as a source of new information, to interpret the speaker's corrective intent, rather than simply as noise or user error. We demonstrate the application of a machine-learning technique, decision trees, for identifying spoken corrections and achieve accuracy rates close to human levels of performance for corrections of misrecognition errors, using acoustic-prosodic information. This process is simple and local and depends neither on perfect transcription of the recognition string nor complex reasoning based on the full conversation. We further extend the conventional analysis of speaking styles beyond a 'read' versus 'conversational' contrast to extreme clear speech, describing divergence from phonological and durational models for words in this style. by Gina-Anne Levow. Ph.D. 2009-10-01T15:33:42Z 2009-10-01T15:33:42Z 1998 1998 Thesis http://hdl.handle.net/1721.1/47705 42345174 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 106 p. application/pdf Massachusetts Institute of Technology |
spellingShingle | Electrical Engineering and Computer Science Levow, Gina-Anne Characterizing and recognizing spoken corrections in human-computer dialog |
title | Characterizing and recognizing spoken corrections in human-computer dialog |
title_full | Characterizing and recognizing spoken corrections in human-computer dialog |
title_fullStr | Characterizing and recognizing spoken corrections in human-computer dialog |
title_full_unstemmed | Characterizing and recognizing spoken corrections in human-computer dialog |
title_short | Characterizing and recognizing spoken corrections in human-computer dialog |
title_sort | characterizing and recognizing spoken corrections in human computer dialog |
topic | Electrical Engineering and Computer Science |
url | http://hdl.handle.net/1721.1/47705 |
work_keys_str_mv | AT levowginaanne characterizingandrecognizingspokencorrectionsinhumancomputerdialog |