Cognitive Audio: Enabling Auditory Interfaces with an Understanding of How We Hear

Over the last several decades, neuroscientists, cognitive scientists, and psychologists have made strides in understanding the complex and mysterious processes that define the interaction between our minds and the sounds around us. Some of these processes, particularly at the lowest levels of abstr...

Full description

Bibliographic Details
Main Author: Ananthabhotla, Ishwarya
Other Authors: Paradiso, Joseph A.
Format: Thesis
Published: Massachusetts Institute of Technology 2022
Online Access:https://hdl.handle.net/1721.1/143241
_version_ 1826189074877120512
author Ananthabhotla, Ishwarya
author2 Paradiso, Joseph A.
author_facet Paradiso, Joseph A.
Ananthabhotla, Ishwarya
author_sort Ananthabhotla, Ishwarya
collection MIT
description Over the last several decades, neuroscientists, cognitive scientists, and psychologists have made strides in understanding the complex and mysterious processes that define the interaction between our minds and the sounds around us. Some of these processes, particularly at the lowest levels of abstraction relative to a sound wave, are well understood, and are easy to characterize across large sections of the human population; others, however, are the sum of both intuition and observations drawn from small-scale laboratory experiments, and remain as of yet poorly understood. In this thesis, I suggest that there is value in coupling insight into the workings of auditory processing, beginning with abstractions in pre-conscious processing, with new frontiers in interface design and state-of-the-art infrastructure for parsing and identifying sound objects, as a means of unlocking audio technologies that are much more immersive, naturalistic, and synergistic than those present in the existing landscape. From the vantage point of today's computational models and devices that largely represent audio at the level of the digital sample, I gesture towards a world of auditory interfaces that work deeply in concert with uniquely human tendencies, allowing us to altogether re-imagine how we capture, preserve, and experience bodies of sound -- towards, for example, augmented reality devices that manipulate sound objects to minimize distractions, lossy "codecs" that operate on semantic rather than time-frequency information, and soundscape design engines operating on large corpora of audio data that optimize for aesthetic or experiential outcomes instead of purely objective ones. To do this, I aim to introduce and explore a new research direction focused on the marriage of principles governing pre-conscious auditory cognition with traditional HCI approaches to auditory interface design via explicit statistical modeling, termed "Cognitive Audio". Along the way, I consider the major roadblocks that present themselves in approaching this convergence: I ask how we might "probe" and measure a cognitive principle of interest robustly enough to inform system design, in the absence of immediately observable biophysical phenomena that may accompany, for example, visual cognition; I also ask how we might build reliable, meaningful statistical models from the resulting data that drive compelling experiences despite inherent noise, sparsity, and generalizations made at the level of the crowd. I discuss early insights into these questions through the lens of a series of projects centered on auditory processing at different levels of abstraction. I begin with a discussion of early work focused on cognitive models of lower-level phenomena; these exercises then inform a comprehensive effort to construct general purpose estimators of gestalt concepts in sound understanding. I then demonstrate the affordances of these estimators in the context of application systems that I construct and characterize, incorporating additional explorations on methods for personalization that sit atop these estimators. Finally, I conclude with a dialogue on the intersection between the key contributions in this dissertation and a string of major themes relevant to the audio technology and computation world today.
first_indexed 2024-09-23T08:09:08Z
format Thesis
id mit-1721.1/143241
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T08:09:08Z
publishDate 2022
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1432412022-06-16T03:58:05Z Cognitive Audio: Enabling Auditory Interfaces with an Understanding of How We Hear Ananthabhotla, Ishwarya Paradiso, Joseph A. Program in Media Arts and Sciences (Massachusetts Institute of Technology) Over the last several decades, neuroscientists, cognitive scientists, and psychologists have made strides in understanding the complex and mysterious processes that define the interaction between our minds and the sounds around us. Some of these processes, particularly at the lowest levels of abstraction relative to a sound wave, are well understood, and are easy to characterize across large sections of the human population; others, however, are the sum of both intuition and observations drawn from small-scale laboratory experiments, and remain as of yet poorly understood. In this thesis, I suggest that there is value in coupling insight into the workings of auditory processing, beginning with abstractions in pre-conscious processing, with new frontiers in interface design and state-of-the-art infrastructure for parsing and identifying sound objects, as a means of unlocking audio technologies that are much more immersive, naturalistic, and synergistic than those present in the existing landscape. From the vantage point of today's computational models and devices that largely represent audio at the level of the digital sample, I gesture towards a world of auditory interfaces that work deeply in concert with uniquely human tendencies, allowing us to altogether re-imagine how we capture, preserve, and experience bodies of sound -- towards, for example, augmented reality devices that manipulate sound objects to minimize distractions, lossy "codecs" that operate on semantic rather than time-frequency information, and soundscape design engines operating on large corpora of audio data that optimize for aesthetic or experiential outcomes instead of purely objective ones. To do this, I aim to introduce and explore a new research direction focused on the marriage of principles governing pre-conscious auditory cognition with traditional HCI approaches to auditory interface design via explicit statistical modeling, termed "Cognitive Audio". Along the way, I consider the major roadblocks that present themselves in approaching this convergence: I ask how we might "probe" and measure a cognitive principle of interest robustly enough to inform system design, in the absence of immediately observable biophysical phenomena that may accompany, for example, visual cognition; I also ask how we might build reliable, meaningful statistical models from the resulting data that drive compelling experiences despite inherent noise, sparsity, and generalizations made at the level of the crowd. I discuss early insights into these questions through the lens of a series of projects centered on auditory processing at different levels of abstraction. I begin with a discussion of early work focused on cognitive models of lower-level phenomena; these exercises then inform a comprehensive effort to construct general purpose estimators of gestalt concepts in sound understanding. I then demonstrate the affordances of these estimators in the context of application systems that I construct and characterize, incorporating additional explorations on methods for personalization that sit atop these estimators. Finally, I conclude with a dialogue on the intersection between the key contributions in this dissertation and a string of major themes relevant to the audio technology and computation world today. Ph.D. 2022-06-15T13:06:09Z 2022-06-15T13:06:09Z 2022-02 2022-02-27T16:54:53.503Z Thesis https://hdl.handle.net/1721.1/143241 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Ananthabhotla, Ishwarya
Cognitive Audio: Enabling Auditory Interfaces with an Understanding of How We Hear
title Cognitive Audio: Enabling Auditory Interfaces with an Understanding of How We Hear
title_full Cognitive Audio: Enabling Auditory Interfaces with an Understanding of How We Hear
title_fullStr Cognitive Audio: Enabling Auditory Interfaces with an Understanding of How We Hear
title_full_unstemmed Cognitive Audio: Enabling Auditory Interfaces with an Understanding of How We Hear
title_short Cognitive Audio: Enabling Auditory Interfaces with an Understanding of How We Hear
title_sort cognitive audio enabling auditory interfaces with an understanding of how we hear
url https://hdl.handle.net/1721.1/143241
work_keys_str_mv AT ananthabhotlaishwarya cognitiveaudioenablingauditoryinterfaceswithanunderstandingofhowwehear