Benign overfitting and noisy features
Modern machine learning often operates in the regime where the number of parameters is much higher than the number of data points, with zero training loss and yet good generalization, thereby contradicting the classical bias-variance trade-off. This \textit{benign overfitting} phenomenon has recentl...
Main Authors: | , , |
---|---|
Format: | Journal article |
Language: | English |
Published: |
Taylor and Francis
2022
|
_version_ | 1797113069054197760 |
---|---|
author | Li, Z Su, W Sejdinovic, D |
author_facet | Li, Z Su, W Sejdinovic, D |
author_sort | Li, Z |
collection | OXFORD |
description | Modern machine learning often operates in the regime where the number of
parameters is much higher than the number of data points, with zero training
loss and yet good generalization, thereby contradicting the classical
bias-variance trade-off. This \textit{benign overfitting} phenomenon has
recently been characterized using so called \textit{double descent} curves
where the risk undergoes another descent (in addition to the classical U-shaped
learning curve when the number of parameters is small) as we increase the
number of parameters beyond a certain threshold. In this paper, we examine the
conditions under which \textit{Benign Overfitting} occurs in the random feature
(RF) models, i.e. in a two-layer neural network with fixed first layer weights.
We adopt a new view of random feature and show that \textit{benign overfitting}
arises due to the noise which resides in such features (the noise may already
be present in the data and propagate to the features or it may be added by the
user to the features directly) and plays an important implicit regularization
role in the phenomenon. |
first_indexed | 2024-03-07T07:20:23Z |
format | Journal article |
id | oxford-uuid:cdf8ecbc-d8a1-433b-9edd-d37043f00abd |
institution | University of Oxford |
language | English |
last_indexed | 2024-04-09T03:57:08Z |
publishDate | 2022 |
publisher | Taylor and Francis |
record_format | dspace |
spelling | oxford-uuid:cdf8ecbc-d8a1-433b-9edd-d37043f00abd2024-03-20T09:23:51ZBenign overfitting and noisy featuresJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:cdf8ecbc-d8a1-433b-9edd-d37043f00abdEnglishSymplectic ElementsTaylor and Francis2022Li, ZSu, WSejdinovic, DModern machine learning often operates in the regime where the number of parameters is much higher than the number of data points, with zero training loss and yet good generalization, thereby contradicting the classical bias-variance trade-off. This \textit{benign overfitting} phenomenon has recently been characterized using so called \textit{double descent} curves where the risk undergoes another descent (in addition to the classical U-shaped learning curve when the number of parameters is small) as we increase the number of parameters beyond a certain threshold. In this paper, we examine the conditions under which \textit{Benign Overfitting} occurs in the random feature (RF) models, i.e. in a two-layer neural network with fixed first layer weights. We adopt a new view of random feature and show that \textit{benign overfitting} arises due to the noise which resides in such features (the noise may already be present in the data and propagate to the features or it may be added by the user to the features directly) and plays an important implicit regularization role in the phenomenon. |
spellingShingle | Li, Z Su, W Sejdinovic, D Benign overfitting and noisy features |
title | Benign overfitting and noisy features |
title_full | Benign overfitting and noisy features |
title_fullStr | Benign overfitting and noisy features |
title_full_unstemmed | Benign overfitting and noisy features |
title_short | Benign overfitting and noisy features |
title_sort | benign overfitting and noisy features |
work_keys_str_mv | AT liz benignoverfittingandnoisyfeatures AT suw benignoverfittingandnoisyfeatures AT sejdinovicd benignoverfittingandnoisyfeatures |