Conversational scene analysis

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2002.

Bibliographic Details
Main Author: Basu, Sumit
Other Authors: Alex P. Pentland.
Format: Thesis
Language:eng
Published: Massachusetts Institute of Technology 2005
Subjects:
Online Access:http://hdl.handle.net/1721.1/29270
_version_ 1826208563727433728
author Basu, Sumit
author2 Alex P. Pentland.
author_facet Alex P. Pentland.
Basu, Sumit
author_sort Basu, Sumit
collection MIT
description Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2002.
first_indexed 2024-09-23T14:07:37Z
format Thesis
id mit-1721.1/29270
institution Massachusetts Institute of Technology
language eng
last_indexed 2024-09-23T14:07:37Z
publishDate 2005
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/292702019-04-10T21:28:30Z Conversational scene analysis Basu, Sumit Alex P. Pentland. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2002. Includes bibliographical references (p. 106-109). In this thesis, we develop computational tools for analyzing conversations based on nonverbal auditory cues. We develop a notion of conversations as being made up of a variety of scenes: in each scene, either one speaker is holding the floor or both are speaking at equal levels. Our goal is to find conversations, find the scenes within them, determine what is happening inside the scenes, and then use the scene structure to characterize entire conversations. We begin by developing a series of mid-level feature detectors, including a joint voicing and speech detection method that is extremely robust to noise and microphone distance. Leveraging the results of this powerful mechanism, we develop a probabilistic pitch tracking mechanism, methods for estimating speaking rate and energy, and means to segment the stream into multiple speakers, all in significant noise conditions. These features gives us the ability to sense the interactions and characterize the style of each speaker's behavior. We then turn to the domain of conversations. We first show how we can very accurately detect conversations from independent or dependent auditory streams with measures derived from our mid-level features. We then move to developing methods to accurately classify and segment a conversation into scenes. We also show preliminary results on characterizing the varying nature of the speakers' behavior during these regions. Finally, we design features to describe entire conversations from the scene structure, and show how we can describe and browse through conversation types in this way. by Sumit Basu. Ph.D. 2005-10-14T19:33:45Z 2005-10-14T19:33:45Z 2002 2002 Thesis http://hdl.handle.net/1721.1/29270 52052659 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 109 leaves 5478355 bytes 5478164 bytes application/pdf application/pdf application/pdf Massachusetts Institute of Technology
spellingShingle Electrical Engineering and Computer Science.
Basu, Sumit
Conversational scene analysis
title Conversational scene analysis
title_full Conversational scene analysis
title_fullStr Conversational scene analysis
title_full_unstemmed Conversational scene analysis
title_short Conversational scene analysis
title_sort conversational scene analysis
topic Electrical Engineering and Computer Science.
url http://hdl.handle.net/1721.1/29270
work_keys_str_mv AT basusumit conversationalsceneanalysis