On kernel and feature learning in neural networks

<p>Inspired by the theory of wide neural networks (NNs), kernel learning and feature learning have recently emerged as two paradigms through which we can understand the complex behaviours of large-scale deep learning systems in practice. In the literature, they are often portrayed as two oppos...

Full description

Bibliographic Details
Main Author: He, B
Other Authors: Teh, Y-W
Format: Thesis
Language:English
Published: 2022
Subjects:
_version_ 1817932825383403520
author He, B
author2 Teh, Y-W
author_facet Teh, Y-W
He, B
author_sort He, B
collection OXFORD
description <p>Inspired by the theory of wide neural networks (NNs), kernel learning and feature learning have recently emerged as two paradigms through which we can understand the complex behaviours of large-scale deep learning systems in practice. In the literature, they are often portrayed as two opposing ends of a dichotomy, both with their own strengths and weaknesses: one, kernel learning, draws connections to well-studied machine learning techniques like kernel methods and Gaussian Processes, whereas the other, feature learning, promises to capture more of the rich, but yet unexplained, properties that are unique to NNs.</p> <p>In this thesis, we present three works studying properties of NNs that combine insights from both perspectives, highlighting not only their differences but also shared similarities. We start by reviewing relevant literature on the theory of deep learning, with a focus on the study of wide NNs. This provides context for a discussion of kernel and feature learning, and against this backdrop, we proceed to describe our contributions. First, we examine the relationship between ensembles of wide NNs and Bayesian inference using connections from kernel learning to Gaussian Processes, and propose a modification that accounts for missing variance at initialisation in NN functions, resulting in a Bayesian interpretation to our trained deep ensembles. Next, we combine kernel and feature learning to demonstrate the suitability of the <i>feature kernel</i>, i.e. the kernel induced by inner products over final layer NN features, as a target for knowledge distillation, where one seeks to use a powerful teacher model to improve the performance of a weaker student model. Finally, we explore the gap between collapsed and whitened features in self-supervised learning, highlighting the decay rate of eigenvalues in the feature kernel as a key quantity that bridges between this gap and impacts downstream generalisation performance, especially in settings with scarce labelled data. We conclude with a discussion, including limitations and future outlook, of our contributions.</p>
first_indexed 2024-03-07T07:46:39Z
format Thesis
id oxford-uuid:3c4dc0bf-462f-496c-be4d-da23064e2dfd
institution University of Oxford
language English
last_indexed 2024-12-09T03:44:04Z
publishDate 2022
record_format dspace
spelling oxford-uuid:3c4dc0bf-462f-496c-be4d-da23064e2dfd2024-12-07T15:23:32ZOn kernel and feature learning in neural networksThesishttp://purl.org/coar/resource_type/c_db06uuid:3c4dc0bf-462f-496c-be4d-da23064e2dfdDeep learning (Machine learning)EnglishHyrax Deposit2022He, BTeh, Y-WDoucet, ADeligiannidis, G<p>Inspired by the theory of wide neural networks (NNs), kernel learning and feature learning have recently emerged as two paradigms through which we can understand the complex behaviours of large-scale deep learning systems in practice. In the literature, they are often portrayed as two opposing ends of a dichotomy, both with their own strengths and weaknesses: one, kernel learning, draws connections to well-studied machine learning techniques like kernel methods and Gaussian Processes, whereas the other, feature learning, promises to capture more of the rich, but yet unexplained, properties that are unique to NNs.</p> <p>In this thesis, we present three works studying properties of NNs that combine insights from both perspectives, highlighting not only their differences but also shared similarities. We start by reviewing relevant literature on the theory of deep learning, with a focus on the study of wide NNs. This provides context for a discussion of kernel and feature learning, and against this backdrop, we proceed to describe our contributions. First, we examine the relationship between ensembles of wide NNs and Bayesian inference using connections from kernel learning to Gaussian Processes, and propose a modification that accounts for missing variance at initialisation in NN functions, resulting in a Bayesian interpretation to our trained deep ensembles. Next, we combine kernel and feature learning to demonstrate the suitability of the <i>feature kernel</i>, i.e. the kernel induced by inner products over final layer NN features, as a target for knowledge distillation, where one seeks to use a powerful teacher model to improve the performance of a weaker student model. Finally, we explore the gap between collapsed and whitened features in self-supervised learning, highlighting the decay rate of eigenvalues in the feature kernel as a key quantity that bridges between this gap and impacts downstream generalisation performance, especially in settings with scarce labelled data. We conclude with a discussion, including limitations and future outlook, of our contributions.</p>
spellingShingle Deep learning (Machine learning)
He, B
On kernel and feature learning in neural networks
title On kernel and feature learning in neural networks
title_full On kernel and feature learning in neural networks
title_fullStr On kernel and feature learning in neural networks
title_full_unstemmed On kernel and feature learning in neural networks
title_short On kernel and feature learning in neural networks
title_sort on kernel and feature learning in neural networks
topic Deep learning (Machine learning)
work_keys_str_mv AT heb onkernelandfeaturelearninginneuralnetworks