Summary: | Neural networks automate the process of representing objects and their relations on a computer, including everything from household items to molecules. New representations are obtained by transforming different instances into a shared representation space, where variations in data can be measured using simple geometric quantities such as Euclidean distances. This thesis studies the geometric structure of this space and its influence on key properties of the learning process, including how much data is needed to acquire new skills, when predictions will fail, and the computational cost of learning. We examine two foundational aspects of the geometry of neural network representations.
Part I designs and studies learning algorithms that take into account the location of data in representation space. Focusing on contrastive self-supervised learning, we design a) hard instance sampling strategies and b) methods for controlling what features models learn. Each produces improvements in key characteristics, such as training speed, generalization, and model reliability.
Part II studies how to use non-Euclidean geometries to build network architectures that respect symmetries and structures arising in physical data, providing a powerful inductive bias for learning. Specifically, we use geometric spaces such as the real projective plane and the spectraplex to build a) provably powerful neural networks that respect the symmetries of eigenvectors, which is important for building Transformers on graph structured data, and b) neural networks that solve combinatorial optimization problems on graphs such as finding big cliques or small cuts, which arise in molecular engineering and network science.
|