Rapid Visual Object Learning in Humans is Explainable by Low-Dimensional Image Representations
How humans learn to recognize new objects is an open problem. In this thesis, we consider one class of theories for how this is accomplished: humans re-represent incoming retinal images in a stable, multidimensional Euclidean space, and build linear decoders in this space for new object categories f...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2023
|
Online Access: | https://hdl.handle.net/1721.1/147557 https://orcid.org/0000-0002-2576-6059 |
_version_ | 1826208501122203648 |
---|---|
author | Lee, Michael Jinsuk |
author2 | DiCarlo, James J. |
author_facet | DiCarlo, James J. Lee, Michael Jinsuk |
author_sort | Lee, Michael Jinsuk |
collection | MIT |
description | How humans learn to recognize new objects is an open problem. In this thesis, we consider one class of theories for how this is accomplished: humans re-represent incoming retinal images in a stable, multidimensional Euclidean space, and build linear decoders in this space for new object categories from image exemplars.
In Part I, we empirically characterize human learning behavior over a battery of different learning subtasks, and find humans rapidly learn new objects from a small number of examples. We then build neurally-mechanistic, end-to-end models of object learning based on recent advances in image-computable models of ventral stream representations. We point to shortcomings of these models, including the fact none of these models actually match the ability to human few-shot learn.
In Part II, we analyze this few-shot learning failure from a theoretical perspective, and show that a geometric property of image representations — variation in directions orthogonal to the one needed to linearly solve the task — slows learning. Given this observation, we motivate the hypothesis that current models of visual processing represent images along a much higher number of dimensions, relative to humans.
In Part III, we identify (and remove) these hypothesized excess dimensions by developing the "perceptual alignment" method, where we combine a classical approach in experimental psychology — inferring internal stimulus representations using measurements of human similarity judgements — with deep learning methods, and create new, lower-dimensional, image-computable representations which capture patterns of human similarity judgements. Finally, we show models based on these new representations predict the ability of humans to few-shot learn across a variety of object domains. They also successfully predict the inability of humans to learn tasks based on representational dimensions that are present in baseline models but absent in perceptually aligned ones. Taken together, this thesis shows specific, neurally-mechanistic models based on a simple theory of learning are strong accounts of how humans rapidly learn new objects. |
first_indexed | 2024-09-23T14:06:21Z |
format | Thesis |
id | mit-1721.1/147557 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T14:06:21Z |
publishDate | 2023 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1475572023-01-20T03:22:37Z Rapid Visual Object Learning in Humans is Explainable by Low-Dimensional Image Representations Lee, Michael Jinsuk DiCarlo, James J. Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences How humans learn to recognize new objects is an open problem. In this thesis, we consider one class of theories for how this is accomplished: humans re-represent incoming retinal images in a stable, multidimensional Euclidean space, and build linear decoders in this space for new object categories from image exemplars. In Part I, we empirically characterize human learning behavior over a battery of different learning subtasks, and find humans rapidly learn new objects from a small number of examples. We then build neurally-mechanistic, end-to-end models of object learning based on recent advances in image-computable models of ventral stream representations. We point to shortcomings of these models, including the fact none of these models actually match the ability to human few-shot learn. In Part II, we analyze this few-shot learning failure from a theoretical perspective, and show that a geometric property of image representations — variation in directions orthogonal to the one needed to linearly solve the task — slows learning. Given this observation, we motivate the hypothesis that current models of visual processing represent images along a much higher number of dimensions, relative to humans. In Part III, we identify (and remove) these hypothesized excess dimensions by developing the "perceptual alignment" method, where we combine a classical approach in experimental psychology — inferring internal stimulus representations using measurements of human similarity judgements — with deep learning methods, and create new, lower-dimensional, image-computable representations which capture patterns of human similarity judgements. Finally, we show models based on these new representations predict the ability of humans to few-shot learn across a variety of object domains. They also successfully predict the inability of humans to learn tasks based on representational dimensions that are present in baseline models but absent in perceptually aligned ones. Taken together, this thesis shows specific, neurally-mechanistic models based on a simple theory of learning are strong accounts of how humans rapidly learn new objects. Ph.D. 2023-01-19T19:58:24Z 2023-01-19T19:58:24Z 2022-09 2022-09-28T17:20:31.043Z Thesis https://hdl.handle.net/1721.1/147557 https://orcid.org/0000-0002-2576-6059 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Lee, Michael Jinsuk Rapid Visual Object Learning in Humans is Explainable by Low-Dimensional Image Representations |
title | Rapid Visual Object Learning in Humans is Explainable by Low-Dimensional Image Representations |
title_full | Rapid Visual Object Learning in Humans is Explainable by Low-Dimensional Image Representations |
title_fullStr | Rapid Visual Object Learning in Humans is Explainable by Low-Dimensional Image Representations |
title_full_unstemmed | Rapid Visual Object Learning in Humans is Explainable by Low-Dimensional Image Representations |
title_short | Rapid Visual Object Learning in Humans is Explainable by Low-Dimensional Image Representations |
title_sort | rapid visual object learning in humans is explainable by low dimensional image representations |
url | https://hdl.handle.net/1721.1/147557 https://orcid.org/0000-0002-2576-6059 |
work_keys_str_mv | AT leemichaeljinsuk rapidvisualobjectlearninginhumansisexplainablebylowdimensionalimagerepresentations |