Modeling the Geometry of Neural Network Representation Spaces

Neural networks automate the process of representing objects and their relations on a computer, including everything from household items to molecules. New representations are obtained by transforming different instances into a shared representation space, where variations in data can be measured us...

Full description

Bibliographic Details
Main Author: Robinson, Joshua David
Other Authors: Jegelka, Stefanie
Format: Thesis
Published: Massachusetts Institute of Technology 2023
Online Access:https://hdl.handle.net/1721.1/152692
_version_ 1811092043998429184
author Robinson, Joshua David
author2 Jegelka, Stefanie
author_facet Jegelka, Stefanie
Robinson, Joshua David
author_sort Robinson, Joshua David
collection MIT
description Neural networks automate the process of representing objects and their relations on a computer, including everything from household items to molecules. New representations are obtained by transforming different instances into a shared representation space, where variations in data can be measured using simple geometric quantities such as Euclidean distances. This thesis studies the geometric structure of this space and its influence on key properties of the learning process, including how much data is needed to acquire new skills, when predictions will fail, and the computational cost of learning. We examine two foundational aspects of the geometry of neural network representations. Part I designs and studies learning algorithms that take into account the location of data in representation space. Focusing on contrastive self-supervised learning, we design a) hard instance sampling strategies and b) methods for controlling what features models learn. Each produces improvements in key characteristics, such as training speed, generalization, and model reliability. Part II studies how to use non-Euclidean geometries to build network architectures that respect symmetries and structures arising in physical data, providing a powerful inductive bias for learning. Specifically, we use geometric spaces such as the real projective plane and the spectraplex to build a) provably powerful neural networks that respect the symmetries of eigenvectors, which is important for building Transformers on graph structured data, and b) neural networks that solve combinatorial optimization problems on graphs such as finding big cliques or small cuts, which arise in molecular engineering and network science.
first_indexed 2024-09-23T15:12:01Z
format Thesis
id mit-1721.1/152692
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T15:12:01Z
publishDate 2023
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1526922023-11-03T03:12:33Z Modeling the Geometry of Neural Network Representation Spaces Robinson, Joshua David Jegelka, Stefanie Sra, Suvrit Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Neural networks automate the process of representing objects and their relations on a computer, including everything from household items to molecules. New representations are obtained by transforming different instances into a shared representation space, where variations in data can be measured using simple geometric quantities such as Euclidean distances. This thesis studies the geometric structure of this space and its influence on key properties of the learning process, including how much data is needed to acquire new skills, when predictions will fail, and the computational cost of learning. We examine two foundational aspects of the geometry of neural network representations. Part I designs and studies learning algorithms that take into account the location of data in representation space. Focusing on contrastive self-supervised learning, we design a) hard instance sampling strategies and b) methods for controlling what features models learn. Each produces improvements in key characteristics, such as training speed, generalization, and model reliability. Part II studies how to use non-Euclidean geometries to build network architectures that respect symmetries and structures arising in physical data, providing a powerful inductive bias for learning. Specifically, we use geometric spaces such as the real projective plane and the spectraplex to build a) provably powerful neural networks that respect the symmetries of eigenvectors, which is important for building Transformers on graph structured data, and b) neural networks that solve combinatorial optimization problems on graphs such as finding big cliques or small cuts, which arise in molecular engineering and network science. Ph.D. 2023-11-02T20:08:51Z 2023-11-02T20:08:51Z 2023-09 2023-09-21T14:26:32.105Z Thesis https://hdl.handle.net/1721.1/152692 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Robinson, Joshua David
Modeling the Geometry of Neural Network Representation Spaces
title Modeling the Geometry of Neural Network Representation Spaces
title_full Modeling the Geometry of Neural Network Representation Spaces
title_fullStr Modeling the Geometry of Neural Network Representation Spaces
title_full_unstemmed Modeling the Geometry of Neural Network Representation Spaces
title_short Modeling the Geometry of Neural Network Representation Spaces
title_sort modeling the geometry of neural network representation spaces
url https://hdl.handle.net/1721.1/152692
work_keys_str_mv AT robinsonjoshuadavid modelingthegeometryofneuralnetworkrepresentationspaces