Evaluating Machine Learning Models of Sensory Systems

We rely on our sensory systems to perceive and interact with the world, and understanding how these systems work is a central focus in neuroscience. A goal of our field is to build stimulus-computable models of sensory systems that reproduce brain responses and behavior. The past decade has given ri...

Full description

Bibliographic Details
Main Author: Feather, Jenelle
Other Authors: Mcdermott, Josh H.
Format: Thesis
Published: Massachusetts Institute of Technology 2024
Online Access:https://hdl.handle.net/1721.1/154183
https://orcid.org/0000-0001-9753-2393
Description
Summary:We rely on our sensory systems to perceive and interact with the world, and understanding how these systems work is a central focus in neuroscience. A goal of our field is to build stimulus-computable models of sensory systems that reproduce brain responses and behavior. The past decade has given rise to models that capture complex behaviors such as image classification, word recognition, and texture perception. Yet, there are known discrepancies between such models and human observers, such as in the architectural components, learning mechanisms, and resulting representations, that must be rectified to obtain complete models of the brain. This dissertation investigates the representations in contemporary models of sensory systems, focusing on the auditory and visual systems. The first study explores the extent to which deep neural network audio models capture human fMRI responses to sound. Most tested models out-predicted previous hand-engineered models of auditory cortex and exhibited hierarchical brain-model correspondence. The second study investigates the invariances of visual and auditory models of perception using "model metamers", synthetic stimuli that produce the same activations in a model as a natural stimulus. Behavioral experiments on humans using these stimuli reveal that the invariances of most current computational neural network models of perception do not align with human perceptual invariances. Our experiments trace this discrepancy to invariances that are specific to individual models, and provide some guidance for how to eliminate them. The third study uses similar techniques as those used to generate model metamers, but applies them to auditory texture models with the aim of reducing their dimensionality. We found that previous hand-engineered models of auditory texture can be significantly reduced in dimensionality without compromising their ability to capture human perception. The fourth study investigates the representational geometry of neural networks trained with biologically-inspired stochasticity. Together, this work presents ways to compare the representations of neural networks to those of human perceptual systems, and suggests paths for future improvements of these models.