Double-descent curves in neural networks: a new perspective using Gaussian processes

Double-descent curves in neural networks describe the phenomenon that the generalisation error initially descends with increasing parameters, then grows after reaching an optimal number of parameters which is less than the number of data points, but then descends again in the overparameterized regim...

Full description

Bibliographic Details
Main Authors: El Harzli, O, Cuenca Grau, B, Valle-Pérez, G, Louis, AA
Format: Conference item
Language:English
Published: Association for the Advancement of Artificial Intelligence 2024
_version_ 1826313663406931968
author El Harzli, O
Cuenca Grau, B
Valle-Pérez, G
Louis, AA
author_facet El Harzli, O
Cuenca Grau, B
Valle-Pérez, G
Louis, AA
author_sort El Harzli, O
collection OXFORD
description Double-descent curves in neural networks describe the phenomenon that the generalisation error initially descends with increasing parameters, then grows after reaching an optimal number of parameters which is less than the number of data points, but then descends again in the overparameterized regime. In this paper, we use techniques from random matrix theory to characterize the spectral distribution of the empirical feature covariance matrix as a width-dependent perturbation of the spectrum of the neural network Gaussian process (NNGP) kernel, thus establishing a novel connection between the NNGP literature and the random matrix theory literature in the context of neural networks. Our analytical expressions allow us to explore the generalisation behavior of the corresponding kernel and GP regression. Furthermore, they offer a new interpretation of double-descent in terms of the discrepancy between the width-dependent empirical kernel and the width-independent NNGP kernel.
first_indexed 2024-09-25T04:20:12Z
format Conference item
id oxford-uuid:395bbccd-6fd3-40bf-8f58-f06b21ad8ec3
institution University of Oxford
language English
last_indexed 2024-09-25T04:20:12Z
publishDate 2024
publisher Association for the Advancement of Artificial Intelligence
record_format dspace
spelling oxford-uuid:395bbccd-6fd3-40bf-8f58-f06b21ad8ec32024-08-01T15:08:25ZDouble-descent curves in neural networks: a new perspective using Gaussian processesConference itemhttp://purl.org/coar/resource_type/c_5794uuid:395bbccd-6fd3-40bf-8f58-f06b21ad8ec3EnglishSymplectic ElementsAssociation for the Advancement of Artificial Intelligence2024El Harzli, OCuenca Grau, BValle-Pérez, GLouis, AADouble-descent curves in neural networks describe the phenomenon that the generalisation error initially descends with increasing parameters, then grows after reaching an optimal number of parameters which is less than the number of data points, but then descends again in the overparameterized regime. In this paper, we use techniques from random matrix theory to characterize the spectral distribution of the empirical feature covariance matrix as a width-dependent perturbation of the spectrum of the neural network Gaussian process (NNGP) kernel, thus establishing a novel connection between the NNGP literature and the random matrix theory literature in the context of neural networks. Our analytical expressions allow us to explore the generalisation behavior of the corresponding kernel and GP regression. Furthermore, they offer a new interpretation of double-descent in terms of the discrepancy between the width-dependent empirical kernel and the width-independent NNGP kernel.
spellingShingle El Harzli, O
Cuenca Grau, B
Valle-Pérez, G
Louis, AA
Double-descent curves in neural networks: a new perspective using Gaussian processes
title Double-descent curves in neural networks: a new perspective using Gaussian processes
title_full Double-descent curves in neural networks: a new perspective using Gaussian processes
title_fullStr Double-descent curves in neural networks: a new perspective using Gaussian processes
title_full_unstemmed Double-descent curves in neural networks: a new perspective using Gaussian processes
title_short Double-descent curves in neural networks: a new perspective using Gaussian processes
title_sort double descent curves in neural networks a new perspective using gaussian processes
work_keys_str_mv AT elharzlio doubledescentcurvesinneuralnetworksanewperspectiveusinggaussianprocesses
AT cuencagraub doubledescentcurvesinneuralnetworksanewperspectiveusinggaussianprocesses
AT valleperezg doubledescentcurvesinneuralnetworksanewperspectiveusinggaussianprocesses
AT louisaa doubledescentcurvesinneuralnetworksanewperspectiveusinggaussianprocesses