Deep neural networks have an inbuilt Occam’s razor

The remarkable performance of overparameterized deep neural networks (DNNs) must arise from an interplay between network architecture, training algorithms, and structure in the data. To disentangle these three components for supervised learning, we apply a Bayesian picture based on the functions exp...

Full description

Bibliographic Details
Main Authors: Mingard, C, Rees, H, Valle-Pérez, G, Louis, AA
Format: Journal article
Language:English
Published: Nature Research 2025
_version_ 1824458964253802496
author Mingard, C
Rees, H
Valle-Pérez, G
Louis, AA
author_facet Mingard, C
Rees, H
Valle-Pérez, G
Louis, AA
author_sort Mingard, C
collection OXFORD
description The remarkable performance of overparameterized deep neural networks (DNNs) must arise from an interplay between network architecture, training algorithms, and structure in the data. To disentangle these three components for supervised learning, we apply a Bayesian picture based on the functions expressed by a DNN. The prior over functions is determined by the network architecture, which we vary by exploiting a transition between ordered and chaotic regimes. For Boolean function classification, we approximate the likelihood using the error spectrum of functions on data. Combining this with the prior yields an accurate prediction for the posterior, measured for DNNs trained with stochastic gradient descent. This analysis shows that structured data, together with a specific Occam’s razor-like inductive bias towards (Kolmogorov) simple functions that exactly counteracts the exponential growth of the number of functions with complexity, is a key to the success of DNNs.
first_indexed 2025-02-19T04:34:15Z
format Journal article
id oxford-uuid:1907a7d3-ca17-4154-aa25-71fd5521d063
institution University of Oxford
language English
last_indexed 2025-02-19T04:34:15Z
publishDate 2025
publisher Nature Research
record_format dspace
spelling oxford-uuid:1907a7d3-ca17-4154-aa25-71fd5521d0632025-01-23T20:03:43ZDeep neural networks have an inbuilt Occam’s razorJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:1907a7d3-ca17-4154-aa25-71fd5521d063EnglishJisc Publications RouterNature Research2025Mingard, CRees, HValle-Pérez, GLouis, AAThe remarkable performance of overparameterized deep neural networks (DNNs) must arise from an interplay between network architecture, training algorithms, and structure in the data. To disentangle these three components for supervised learning, we apply a Bayesian picture based on the functions expressed by a DNN. The prior over functions is determined by the network architecture, which we vary by exploiting a transition between ordered and chaotic regimes. For Boolean function classification, we approximate the likelihood using the error spectrum of functions on data. Combining this with the prior yields an accurate prediction for the posterior, measured for DNNs trained with stochastic gradient descent. This analysis shows that structured data, together with a specific Occam’s razor-like inductive bias towards (Kolmogorov) simple functions that exactly counteracts the exponential growth of the number of functions with complexity, is a key to the success of DNNs.
spellingShingle Mingard, C
Rees, H
Valle-Pérez, G
Louis, AA
Deep neural networks have an inbuilt Occam’s razor
title Deep neural networks have an inbuilt Occam’s razor
title_full Deep neural networks have an inbuilt Occam’s razor
title_fullStr Deep neural networks have an inbuilt Occam’s razor
title_full_unstemmed Deep neural networks have an inbuilt Occam’s razor
title_short Deep neural networks have an inbuilt Occam’s razor
title_sort deep neural networks have an inbuilt occam s razor
work_keys_str_mv AT mingardc deepneuralnetworkshaveaninbuiltoccamsrazor
AT reesh deepneuralnetworkshaveaninbuiltoccamsrazor
AT valleperezg deepneuralnetworkshaveaninbuiltoccamsrazor
AT louisaa deepneuralnetworkshaveaninbuiltoccamsrazor