On kernel and feature learning in neural networks

<p>Inspired by the theory of wide neural networks (NNs), kernel learning and feature learning have recently emerged as two paradigms through which we can understand the complex behaviours of large-scale deep learning systems in practice. In the literature, they are often portrayed as two oppos...

Full description

Bibliographic Details
Main Author:	He, B
Other Authors:	Teh, Y-W
Format:	Thesis
Language:	English
Published:	2022
Subjects:	Deep learning (Machine learning)

_version_	1817932825383403520
author	He, B
author2	Teh, Y-W
author_facet	Teh, Y-W He, B
author_sort	He, B
collection	OXFORD
description	<p>Inspired by the theory of wide neural networks (NNs), kernel learning and feature learning have recently emerged as two paradigms through which we can understand the complex behaviours of large-scale deep learning systems in practice. In the literature, they are often portrayed as two opposing ends of a dichotomy, both with their own strengths and weaknesses: one, kernel learning, draws connections to well-studied machine learning techniques like kernel methods and Gaussian Processes, whereas the other, feature learning, promises to capture more of the rich, but yet unexplained, properties that are unique to NNs.</p> <p>In this thesis, we present three works studying properties of NNs that combine insights from both perspectives, highlighting not only their differences but also shared similarities. We start by reviewing relevant literature on the theory of deep learning, with a focus on the study of wide NNs. This provides context for a discussion of kernel and feature learning, and against this backdrop, we proceed to describe our contributions. First, we examine the relationship between ensembles of wide NNs and Bayesian inference using connections from kernel learning to Gaussian Processes, and propose a modification that accounts for missing variance at initialisation in NN functions, resulting in a Bayesian interpretation to our trained deep ensembles. Next, we combine kernel and feature learning to demonstrate the suitability of the <i>feature kernel</i>, i.e. the kernel induced by inner products over final layer NN features, as a target for knowledge distillation, where one seeks to use a powerful teacher model to improve the performance of a weaker student model. Finally, we explore the gap between collapsed and whitened features in self-supervised learning, highlighting the decay rate of eigenvalues in the feature kernel as a key quantity that bridges between this gap and impacts downstream generalisation performance, especially in settings with scarce labelled data. We conclude with a discussion, including limitations and future outlook, of our contributions.</p>
first_indexed	2024-03-07T07:46:39Z
format	Thesis
id	oxford-uuid:3c4dc0bf-462f-496c-be4d-da23064e2dfd
institution	University of Oxford
language	English
last_indexed	2024-12-09T03:44:04Z
publishDate	2022
record_format	dspace
spelling	oxford-uuid:3c4dc0bf-462f-496c-be4d-da23064e2dfd2024-12-07T15:23:32ZOn kernel and feature learning in neural networksThesishttp://purl.org/coar/resource_type/c_db06uuid:3c4dc0bf-462f-496c-be4d-da23064e2dfdDeep learning (Machine learning)EnglishHyrax Deposit2022He, BTeh, Y-WDoucet, ADeligiannidis, G<p>Inspired by the theory of wide neural networks (NNs), kernel learning and feature learning have recently emerged as two paradigms through which we can understand the complex behaviours of large-scale deep learning systems in practice. In the literature, they are often portrayed as two opposing ends of a dichotomy, both with their own strengths and weaknesses: one, kernel learning, draws connections to well-studied machine learning techniques like kernel methods and Gaussian Processes, whereas the other, feature learning, promises to capture more of the rich, but yet unexplained, properties that are unique to NNs.</p> <p>In this thesis, we present three works studying properties of NNs that combine insights from both perspectives, highlighting not only their differences but also shared similarities. We start by reviewing relevant literature on the theory of deep learning, with a focus on the study of wide NNs. This provides context for a discussion of kernel and feature learning, and against this backdrop, we proceed to describe our contributions. First, we examine the relationship between ensembles of wide NNs and Bayesian inference using connections from kernel learning to Gaussian Processes, and propose a modification that accounts for missing variance at initialisation in NN functions, resulting in a Bayesian interpretation to our trained deep ensembles. Next, we combine kernel and feature learning to demonstrate the suitability of the <i>feature kernel</i>, i.e. the kernel induced by inner products over final layer NN features, as a target for knowledge distillation, where one seeks to use a powerful teacher model to improve the performance of a weaker student model. Finally, we explore the gap between collapsed and whitened features in self-supervised learning, highlighting the decay rate of eigenvalues in the feature kernel as a key quantity that bridges between this gap and impacts downstream generalisation performance, especially in settings with scarce labelled data. We conclude with a discussion, including limitations and future outlook, of our contributions.</p>
spellingShingle	Deep learning (Machine learning) He, B On kernel and feature learning in neural networks
title	On kernel and feature learning in neural networks
title_full	On kernel and feature learning in neural networks
title_fullStr	On kernel and feature learning in neural networks
title_full_unstemmed	On kernel and feature learning in neural networks
title_short	On kernel and feature learning in neural networks
title_sort	on kernel and feature learning in neural networks
topic	Deep learning (Machine learning)
work_keys_str_mv	AT heb onkernelandfeaturelearninginneuralnetworks

On kernel and feature learning in neural networks

Similar Items