Benign overfitting and noisy features

Modern machine learning often operates in the regime where the number of parameters is much higher than the number of data points, with zero training loss and yet good generalization, thereby contradicting the classical bias-variance trade-off. This \textit{benign overfitting} phenomenon has recentl...

Full description

Bibliographic Details
Main Authors: Li, Z, Su, W, Sejdinovic, D
Format: Journal article
Language:English
Published: Taylor and Francis 2022
_version_ 1797113069054197760
author Li, Z
Su, W
Sejdinovic, D
author_facet Li, Z
Su, W
Sejdinovic, D
author_sort Li, Z
collection OXFORD
description Modern machine learning often operates in the regime where the number of parameters is much higher than the number of data points, with zero training loss and yet good generalization, thereby contradicting the classical bias-variance trade-off. This \textit{benign overfitting} phenomenon has recently been characterized using so called \textit{double descent} curves where the risk undergoes another descent (in addition to the classical U-shaped learning curve when the number of parameters is small) as we increase the number of parameters beyond a certain threshold. In this paper, we examine the conditions under which \textit{Benign Overfitting} occurs in the random feature (RF) models, i.e. in a two-layer neural network with fixed first layer weights. We adopt a new view of random feature and show that \textit{benign overfitting} arises due to the noise which resides in such features (the noise may already be present in the data and propagate to the features or it may be added by the user to the features directly) and plays an important implicit regularization role in the phenomenon.
first_indexed 2024-03-07T07:20:23Z
format Journal article
id oxford-uuid:cdf8ecbc-d8a1-433b-9edd-d37043f00abd
institution University of Oxford
language English
last_indexed 2024-04-09T03:57:08Z
publishDate 2022
publisher Taylor and Francis
record_format dspace
spelling oxford-uuid:cdf8ecbc-d8a1-433b-9edd-d37043f00abd2024-03-20T09:23:51ZBenign overfitting and noisy featuresJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:cdf8ecbc-d8a1-433b-9edd-d37043f00abdEnglishSymplectic ElementsTaylor and Francis2022Li, ZSu, WSejdinovic, DModern machine learning often operates in the regime where the number of parameters is much higher than the number of data points, with zero training loss and yet good generalization, thereby contradicting the classical bias-variance trade-off. This \textit{benign overfitting} phenomenon has recently been characterized using so called \textit{double descent} curves where the risk undergoes another descent (in addition to the classical U-shaped learning curve when the number of parameters is small) as we increase the number of parameters beyond a certain threshold. In this paper, we examine the conditions under which \textit{Benign Overfitting} occurs in the random feature (RF) models, i.e. in a two-layer neural network with fixed first layer weights. We adopt a new view of random feature and show that \textit{benign overfitting} arises due to the noise which resides in such features (the noise may already be present in the data and propagate to the features or it may be added by the user to the features directly) and plays an important implicit regularization role in the phenomenon.
spellingShingle Li, Z
Su, W
Sejdinovic, D
Benign overfitting and noisy features
title Benign overfitting and noisy features
title_full Benign overfitting and noisy features
title_fullStr Benign overfitting and noisy features
title_full_unstemmed Benign overfitting and noisy features
title_short Benign overfitting and noisy features
title_sort benign overfitting and noisy features
work_keys_str_mv AT liz benignoverfittingandnoisyfeatures
AT suw benignoverfittingandnoisyfeatures
AT sejdinovicd benignoverfittingandnoisyfeatures