Combined-channel instantaneous frequency analysis for audio source separation based on comodulation

Thesis (Ph. D.)--Harvard-MIT Division of Health Sciences and Technology, 2008.

Bibliographic Details
Main Author: Jacobson, Barry David
Other Authors: Thomas F. Quatieri and Gert Cauwenberghs.
Format: Thesis
Language:eng
Published: Massachusetts Institute of Technology 2009
Subjects:
Online Access:http://hdl.handle.net/1721.1/45911
_version_ 1826211644615688192
author Jacobson, Barry David
author2 Thomas F. Quatieri and Gert Cauwenberghs.
author_facet Thomas F. Quatieri and Gert Cauwenberghs.
Jacobson, Barry David
author_sort Jacobson, Barry David
collection MIT
description Thesis (Ph. D.)--Harvard-MIT Division of Health Sciences and Technology, 2008.
first_indexed 2024-09-23T15:09:15Z
format Thesis
id mit-1721.1/45911
institution Massachusetts Institute of Technology
language eng
last_indexed 2024-09-23T15:09:15Z
publishDate 2009
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/459112019-04-12T09:59:40Z Combined-channel instantaneous frequency analysis for audio source separation based on comodulation Jacobson, Barry David Thomas F. Quatieri and Gert Cauwenberghs. Harvard University--MIT Division of Health Sciences and Technology. Harvard University--MIT Division of Health Sciences and Technology. Harvard University--MIT Division of Health Sciences and Technology. Thesis (Ph. D.)--Harvard-MIT Division of Health Sciences and Technology, 2008. Includes bibliographical references (p. 295-303). Normal human listeners have a remarkable ability to focus on a single sound or speaker of interest and to block out competing sound sources. Individuals with hearing impairments, on the other hand, often experience great difficulty in noisy environments. The goal of our research is to develop novel signal processing methods inspired by neural auditory processing that can improve current speech separation systems. These could potentially be of use as assistive devices for the hearing impaired, and in many other communications applications. Our focus is the monaural case where spatial information is not available. Much perceptual evidence indicates that detecting common amplitude and frequency variation in acoustic signals plays an important role in the separation process. The physical mechanisms of sound generation in many sources cause common onsets/offsets and correlated increases/decreases in both amplitude and frequency among the spectral components of an individual source, which can potentially serve as a distinct signature. However, harnessing these common modulation patterns is difficult because when spectral components of competing sources overlap within the bandwidth of a single auditory filter, the modulation envelope of the resultant waveform resembles that of neither source. To overcome this, for the coherent, constant-frequency AM case, we derive a set of matrix equations which describes the mixture, and we prove that there exists a unique factorization under certain constraints. These constraints provide insight into the importance of onset cues in source separation. We develop algorithms for solving the system in those cases in which a unique solution exists. This work has direct bearing on the general theory of non-negative matrix factorization which has recently been applied to various problems in biology and learning. For the general, incoherent, AM and FM case, the situation is far more complex because constructive and destructive interference between sources causes amplitude fluctuations within channels that obscures the modulation patterns of individual sources. (cont.) Motivated by the importance of temporal processing in the auditory system, and specifically, the use of extrema, we explore novel methods for estimating instantaneous amplitude, frequency, and phase of mixtures of sinusoids by comparing the location of local maxima of waveforms from various frequency channels. By using an overlapping exponential filter bank model with properties resembling the cochlea, and combining information from multiple frequency bands, we are able to achieve extremely high frequency and time resolution. This allows us to isolate and track the behavior of individual spectral components which can be compared and grouped with others of like type. Our work includes both computational and analytic approaches to the general problem. Two suites of tests were performed. The first were comparative evaluations of three filter-bank-based algorithms on sets of harmonic-like signals with constant frequencies. One of these algorithms was selected for further performance tests on more complex waveforms, including AM and FM signals of various types, harmonic sets in noise, and actual recordings of male and female speakers, both individual and mixed. For the frequency-varying case, initial results of signal analysis with our methods appear to resolve individual sidebands of single harmonics on short time scales, and raise interesting conceptual questions on how to define, use and interpret the concept of instantaneous frequency. Based on our results, we revisit a number of questions in current auditory research, including the need for both rate and place coding, the asymmetrical shapes of auditory filters, and a possible explanation for the deficit of the hearing impaired in noise. by Barry David Jacobson. Ph.D. 2009-06-30T16:36:38Z 2009-06-30T16:36:38Z 2008 2008 Thesis http://hdl.handle.net/1721.1/45911 320768440 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 303 p. application/pdf Massachusetts Institute of Technology
spellingShingle Harvard University--MIT Division of Health Sciences and Technology.
Jacobson, Barry David
Combined-channel instantaneous frequency analysis for audio source separation based on comodulation
title Combined-channel instantaneous frequency analysis for audio source separation based on comodulation
title_full Combined-channel instantaneous frequency analysis for audio source separation based on comodulation
title_fullStr Combined-channel instantaneous frequency analysis for audio source separation based on comodulation
title_full_unstemmed Combined-channel instantaneous frequency analysis for audio source separation based on comodulation
title_short Combined-channel instantaneous frequency analysis for audio source separation based on comodulation
title_sort combined channel instantaneous frequency analysis for audio source separation based on comodulation
topic Harvard University--MIT Division of Health Sciences and Technology.
url http://hdl.handle.net/1721.1/45911
work_keys_str_mv AT jacobsonbarrydavid combinedchannelinstantaneousfrequencyanalysisforaudiosourceseparationbasedoncomodulation