Rapid Visual Object Learning in Humans is Explainable by Low-Dimensional Image Representations

How humans learn to recognize new objects is an open problem. In this thesis, we consider one class of theories for how this is accomplished: humans re-represent incoming retinal images in a stable, multidimensional Euclidean space, and build linear decoders in this space for new object categories f...

Full description

Bibliographic Details
Main Author:	Lee, Michael Jinsuk
Other Authors:	DiCarlo, James J.
Format:	Thesis
Published:	Massachusetts Institute of Technology 2023
Online Access:	https://hdl.handle.net/1721.1/147557 https://orcid.org/0000-0002-2576-6059

_version_	1826208501122203648
author	Lee, Michael Jinsuk
author2	DiCarlo, James J.
author_facet	DiCarlo, James J. Lee, Michael Jinsuk
author_sort	Lee, Michael Jinsuk
collection	MIT
description	How humans learn to recognize new objects is an open problem. In this thesis, we consider one class of theories for how this is accomplished: humans re-represent incoming retinal images in a stable, multidimensional Euclidean space, and build linear decoders in this space for new object categories from image exemplars. In Part I, we empirically characterize human learning behavior over a battery of different learning subtasks, and find humans rapidly learn new objects from a small number of examples. We then build neurally-mechanistic, end-to-end models of object learning based on recent advances in image-computable models of ventral stream representations. We point to shortcomings of these models, including the fact none of these models actually match the ability to human few-shot learn. In Part II, we analyze this few-shot learning failure from a theoretical perspective, and show that a geometric property of image representations — variation in directions orthogonal to the one needed to linearly solve the task — slows learning. Given this observation, we motivate the hypothesis that current models of visual processing represent images along a much higher number of dimensions, relative to humans. In Part III, we identify (and remove) these hypothesized excess dimensions by developing the "perceptual alignment" method, where we combine a classical approach in experimental psychology — inferring internal stimulus representations using measurements of human similarity judgements — with deep learning methods, and create new, lower-dimensional, image-computable representations which capture patterns of human similarity judgements. Finally, we show models based on these new representations predict the ability of humans to few-shot learn across a variety of object domains. They also successfully predict the inability of humans to learn tasks based on representational dimensions that are present in baseline models but absent in perceptually aligned ones. Taken together, this thesis shows specific, neurally-mechanistic models based on a simple theory of learning are strong accounts of how humans rapidly learn new objects.
first_indexed	2024-09-23T14:06:21Z
format	Thesis
id	mit-1721.1/147557
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T14:06:21Z
publishDate	2023
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1475572023-01-20T03:22:37Z Rapid Visual Object Learning in Humans is Explainable by Low-Dimensional Image Representations Lee, Michael Jinsuk DiCarlo, James J. Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences How humans learn to recognize new objects is an open problem. In this thesis, we consider one class of theories for how this is accomplished: humans re-represent incoming retinal images in a stable, multidimensional Euclidean space, and build linear decoders in this space for new object categories from image exemplars. In Part I, we empirically characterize human learning behavior over a battery of different learning subtasks, and find humans rapidly learn new objects from a small number of examples. We then build neurally-mechanistic, end-to-end models of object learning based on recent advances in image-computable models of ventral stream representations. We point to shortcomings of these models, including the fact none of these models actually match the ability to human few-shot learn. In Part II, we analyze this few-shot learning failure from a theoretical perspective, and show that a geometric property of image representations — variation in directions orthogonal to the one needed to linearly solve the task — slows learning. Given this observation, we motivate the hypothesis that current models of visual processing represent images along a much higher number of dimensions, relative to humans. In Part III, we identify (and remove) these hypothesized excess dimensions by developing the "perceptual alignment" method, where we combine a classical approach in experimental psychology — inferring internal stimulus representations using measurements of human similarity judgements — with deep learning methods, and create new, lower-dimensional, image-computable representations which capture patterns of human similarity judgements. Finally, we show models based on these new representations predict the ability of humans to few-shot learn across a variety of object domains. They also successfully predict the inability of humans to learn tasks based on representational dimensions that are present in baseline models but absent in perceptually aligned ones. Taken together, this thesis shows specific, neurally-mechanistic models based on a simple theory of learning are strong accounts of how humans rapidly learn new objects. Ph.D. 2023-01-19T19:58:24Z 2023-01-19T19:58:24Z 2022-09 2022-09-28T17:20:31.043Z Thesis https://hdl.handle.net/1721.1/147557 https://orcid.org/0000-0002-2576-6059 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Lee, Michael Jinsuk Rapid Visual Object Learning in Humans is Explainable by Low-Dimensional Image Representations
title	Rapid Visual Object Learning in Humans is Explainable by Low-Dimensional Image Representations
title_full	Rapid Visual Object Learning in Humans is Explainable by Low-Dimensional Image Representations
title_fullStr	Rapid Visual Object Learning in Humans is Explainable by Low-Dimensional Image Representations
title_full_unstemmed	Rapid Visual Object Learning in Humans is Explainable by Low-Dimensional Image Representations
title_short	Rapid Visual Object Learning in Humans is Explainable by Low-Dimensional Image Representations
title_sort	rapid visual object learning in humans is explainable by low dimensional image representations
url	https://hdl.handle.net/1721.1/147557 https://orcid.org/0000-0002-2576-6059
work_keys_str_mv	AT leemichaeljinsuk rapidvisualobjectlearninginhumansisexplainablebylowdimensionalimagerepresentations

Rapid Visual Object Learning in Humans is Explainable by Low-Dimensional Image Representations

Similar Items