Classical generalization bounds are surprisingly tight for Deep Networks

Deep networks are usually trained and tested in a regime in which the training classification error is not a good predictor of the test error. Thus the consensus has been that generalization, defined as convergence of the empirical to the expected error, does not hold for deep networks. Here we show...

Full description

Bibliographic Details
Main Authors: Liao, Qianli, Miranda, Brando, Hidary, Jack, Poggio, Tomaso
Format: Technical Report
Language:en_US
Published: Center for Brains, Minds and Machines (CBMM) 2018
Online Access:http://hdl.handle.net/1721.1/116911
_version_ 1811087881974841344
author Liao, Qianli
Miranda, Brando
Hidary, Jack
Poggio, Tomaso
author_facet Liao, Qianli
Miranda, Brando
Hidary, Jack
Poggio, Tomaso
author_sort Liao, Qianli
collection MIT
description Deep networks are usually trained and tested in a regime in which the training classification error is not a good predictor of the test error. Thus the consensus has been that generalization, defined as convergence of the empirical to the expected error, does not hold for deep networks. Here we show that, when normalized appropriately after training, deep networks trained on exponential type losses show a good linear dependence of test loss on training loss. The observation, motivated by a previous theoretical analysis of overparameterization and overfitting, not only demonstrates the validity of classical generalization bounds for deep learning but suggests that they are tight. In addition, we also show that the bound of the classification error by the normalized cross entropy loss is empirically rather tight on the data sets we studied.
first_indexed 2024-09-23T13:53:23Z
format Technical Report
id mit-1721.1/116911
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T13:53:23Z
publishDate 2018
publisher Center for Brains, Minds and Machines (CBMM)
record_format dspace
spelling mit-1721.1/1169112019-09-12T16:17:54Z Classical generalization bounds are surprisingly tight for Deep Networks Liao, Qianli Miranda, Brando Hidary, Jack Poggio, Tomaso Deep networks are usually trained and tested in a regime in which the training classification error is not a good predictor of the test error. Thus the consensus has been that generalization, defined as convergence of the empirical to the expected error, does not hold for deep networks. Here we show that, when normalized appropriately after training, deep networks trained on exponential type losses show a good linear dependence of test loss on training loss. The observation, motivated by a previous theoretical analysis of overparameterization and overfitting, not only demonstrates the validity of classical generalization bounds for deep learning but suggests that they are tight. In addition, we also show that the bound of the classification error by the normalized cross entropy loss is empirically rather tight on the data sets we studied. This material is based upon work supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216. 2018-07-11T18:15:36Z 2018-07-11T18:15:36Z 2018-07-11 Technical Report Working Paper Other http://hdl.handle.net/1721.1/116911 en_US CBMM Memo Series;091 application/pdf Center for Brains, Minds and Machines (CBMM)
spellingShingle Liao, Qianli
Miranda, Brando
Hidary, Jack
Poggio, Tomaso
Classical generalization bounds are surprisingly tight for Deep Networks
title Classical generalization bounds are surprisingly tight for Deep Networks
title_full Classical generalization bounds are surprisingly tight for Deep Networks
title_fullStr Classical generalization bounds are surprisingly tight for Deep Networks
title_full_unstemmed Classical generalization bounds are surprisingly tight for Deep Networks
title_short Classical generalization bounds are surprisingly tight for Deep Networks
title_sort classical generalization bounds are surprisingly tight for deep networks
url http://hdl.handle.net/1721.1/116911
work_keys_str_mv AT liaoqianli classicalgeneralizationboundsaresurprisinglytightfordeepnetworks
AT mirandabrando classicalgeneralizationboundsaresurprisinglytightfordeepnetworks
AT hidaryjack classicalgeneralizationboundsaresurprisinglytightfordeepnetworks
AT poggiotomaso classicalgeneralizationboundsaresurprisinglytightfordeepnetworks