Characterizations of how neural networks learn

Training neural network architectures on Internet-scale datasets has led to many recent advances in machine learning. However, the mechanisms underlying how neural networks learn from data are largely opaque. This thesis develops a mechanistic understanding of how neural networks learn in several se...

Full description

Bibliographic Details
Main Author:	Boix-Adsera, Enric
Other Authors:	Bresler, Guy
Format:	Thesis
Published:	Massachusetts Institute of Technology 2024
Online Access:	https://hdl.handle.net/1721.1/156306

_version_	1826202812430680064
author	Boix-Adsera, Enric
author2	Bresler, Guy
author_facet	Bresler, Guy Boix-Adsera, Enric
author_sort	Boix-Adsera, Enric
collection	MIT
description	Training neural network architectures on Internet-scale datasets has led to many recent advances in machine learning. However, the mechanisms underlying how neural networks learn from data are largely opaque. This thesis develops a mechanistic understanding of how neural networks learn in several settings, as well as new tools to analyze trained networks. First, we study data where the labels depend on an unknown low-dimensional subspace of the input (i.e., the multi-index setting). We identify the “leap complexity”, which is a quantity that we argue characterizes how much data networks need in order to learn. Our analysis reveals a saddle-to-saddle dynamic in network training, where training alternates between loss plateaus and sharp drops in the loss. Furthermore, we show that network weights evolve such that the trained weights are a low-rank perturbation of the original weights. We observe this effect empirically in state-of-the-art transformer models trained on image and vision data. Second, we study the ability of language models to learn to reason. On a family of “relational reasoning” tasks, we prove that modern transformers learn to reason with enough data, but classical fully-connected architectures do not. Our analysis suggests small architectural modifications that improve data efficiency. Finally, we construct new tools to interpret trained networks. These are: (a) a definition of distance between two models that captures their functional similarity, and (b) a distillation algorithm to efficiently extract interpretable decision-tree structure from a trained model when possible.
first_indexed	2024-09-23T12:19:42Z
format	Thesis
id	mit-1721.1/156306
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T12:19:42Z
publishDate	2024
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1563062024-08-22T03:35:10Z Characterizations of how neural networks learn Boix-Adsera, Enric Bresler, Guy Rigollet, Philippe Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Training neural network architectures on Internet-scale datasets has led to many recent advances in machine learning. However, the mechanisms underlying how neural networks learn from data are largely opaque. This thesis develops a mechanistic understanding of how neural networks learn in several settings, as well as new tools to analyze trained networks. First, we study data where the labels depend on an unknown low-dimensional subspace of the input (i.e., the multi-index setting). We identify the “leap complexity”, which is a quantity that we argue characterizes how much data networks need in order to learn. Our analysis reveals a saddle-to-saddle dynamic in network training, where training alternates between loss plateaus and sharp drops in the loss. Furthermore, we show that network weights evolve such that the trained weights are a low-rank perturbation of the original weights. We observe this effect empirically in state-of-the-art transformer models trained on image and vision data. Second, we study the ability of language models to learn to reason. On a family of “relational reasoning” tasks, we prove that modern transformers learn to reason with enough data, but classical fully-connected architectures do not. Our analysis suggests small architectural modifications that improve data efficiency. Finally, we construct new tools to interpret trained networks. These are: (a) a definition of distance between two models that captures their functional similarity, and (b) a distillation algorithm to efficiently extract interpretable decision-tree structure from a trained model when possible. Ph.D. 2024-08-21T18:55:30Z 2024-08-21T18:55:30Z 2024-05 2024-07-10T13:01:25.993Z Thesis https://hdl.handle.net/1721.1/156306 Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Boix-Adsera, Enric Characterizations of how neural networks learn
title	Characterizations of how neural networks learn
title_full	Characterizations of how neural networks learn
title_fullStr	Characterizations of how neural networks learn
title_full_unstemmed	Characterizations of how neural networks learn
title_short	Characterizations of how neural networks learn
title_sort	characterizations of how neural networks learn
url	https://hdl.handle.net/1721.1/156306
work_keys_str_mv	AT boixadseraenric characterizationsofhowneuralnetworkslearn

Characterizations of how neural networks learn

Similar Items