Analysis of nonmodal glottal event patterns with application to automatic speaker recognition

Thesis (Ph. D.)--Harvard-MIT Division of Health Sciences and Technology, 2008.

Bibliographic Details
Main Author:	Malyska, Nicolas, 1977-
Other Authors:	Thomas F. Quatieri.
Format:	Thesis
Language:	eng
Published:	Massachusetts Institute of Technology 2008
Subjects:	Harvard University > MIT Division of Health Sciences and Technology.
Online Access:	http://hdl.handle.net/1721.1/43804

_version_	1826199355218984960
author	Malyska, Nicolas, 1977-
author2	Thomas F. Quatieri.
author_facet	Thomas F. Quatieri. Malyska, Nicolas, 1977-
author_sort	Malyska, Nicolas, 1977-
collection	MIT
description	Thesis (Ph. D.)--Harvard-MIT Division of Health Sciences and Technology, 2008.
first_indexed	2024-09-23T11:18:54Z
format	Thesis
id	mit-1721.1/43804
institution	Massachusetts Institute of Technology
language	eng
last_indexed	2024-09-23T11:18:54Z
publishDate	2008
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/438042019-04-10T23:27:20Z Analysis of nonmodal glottal event patterns with application to automatic speaker recognition Malyska, Nicolas, 1977- Thomas F. Quatieri. Harvard University--MIT Division of Health Sciences and Technology. Harvard University--MIT Division of Health Sciences and Technology. Harvard University--MIT Division of Health Sciences and Technology. Thesis (Ph. D.)--Harvard-MIT Division of Health Sciences and Technology, 2008. Includes bibliographical references (p. 211-215). Regions of phonation exhibiting nonmodal characteristics are likely to contain information about speaker identity, language, dialect, and vocal-fold health. As a basis for testing such dependencies, we develop a representation of patterns in the relative timing and height of nonmodal glottal pulses. To extract the timing and height of candidate pulses, we investigate a variety of inverse-filtering schemes including maximum-entropy deconvolution that minimizes predictability of a signal and minimum-entropy deconvolution that maximizes pulse-likeness. Hybrid formulations of these methods are also considered. we then derive a theoretical framework for understanding frequency- and time-domain properties of a pulse sequence, a process that sheds light on the transformation of nonmodal pulse trains into useful parameters. In the frequency domain, we introduce the first comprehensive mathematical derivation of the effect of deterministic and stochastic source perturbation on the short-time spectrum. We also propose a pitch representation of nonmodality that provides an alternative viewpoint on the frequency content that does not rely on Fourier bases. In developing time-domain properties, we use projected low-dimensional histograms of feature vectors derived from pulse timing and height parameters. For these features, we have found clusters of distinct pulse patterns, reflecting a wide variety of glottal-pulse phenomena including near-modal phonation, shimmer and jitter, diplophonia and triplophonia, and aperiodicity. Using temporal relationships between successive feature vectors, an algorithm by which to separate these different classes of glottal-pulse characteristics has also been developed. (cont.) We have used our glottal-pulse-pattern representation to automatically test for one signal dependency: speaker dependence of glottal-pulse sequences. This choice is motivated by differences observed between talkers in our separated feature space. Using an automatic speaker verification experiment, we investigate tradeoffs in speaker dependency for short-time pulse patterns, reflecting local irregularity, as well as long-time patterns related to higher-level cyclic variations. Results, using speakers with a broad array of modal and nonmodal behaviors, indicate a high accuracy in speaker recognition performance, complementary to the use of conventional mel-cepstral features. These results suggest that there is rich structure to the source excitation that provides information about a particular speaker's identity. by Nicolas Malyska. Ph.D. 2008-12-11T18:30:23Z 2008-12-11T18:30:23Z 2008 2008 Thesis http://hdl.handle.net/1721.1/43804 261504289 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 215 p. application/pdf Massachusetts Institute of Technology
spellingShingle	Harvard University--MIT Division of Health Sciences and Technology. Malyska, Nicolas, 1977- Analysis of nonmodal glottal event patterns with application to automatic speaker recognition
title	Analysis of nonmodal glottal event patterns with application to automatic speaker recognition
title_full	Analysis of nonmodal glottal event patterns with application to automatic speaker recognition
title_fullStr	Analysis of nonmodal glottal event patterns with application to automatic speaker recognition
title_full_unstemmed	Analysis of nonmodal glottal event patterns with application to automatic speaker recognition
title_short	Analysis of nonmodal glottal event patterns with application to automatic speaker recognition
title_sort	analysis of nonmodal glottal event patterns with application to automatic speaker recognition
topic	Harvard University--MIT Division of Health Sciences and Technology.
url	http://hdl.handle.net/1721.1/43804
work_keys_str_mv	AT malyskanicolas1977 analysisofnonmodalglottaleventpatternswithapplicationtoautomaticspeakerrecognition

Analysis of nonmodal glottal event patterns with application to automatic speaker recognition

Similar Items