Cross-validation Stability of Deep Networks

Recent theoretical results show that gradient descent on deep neural networks under exponential loss functions locally maximizes classification margin, which is equivalent to minimizing the norm of the weight matrices under margin constraints. This property of the solution however does not fully ch...

Full description

Bibliographic Details
Main Authors:	Banburski, Andrzej, De La Torre, Fernanda, Plant, Nishka, Shastri, Ishana, Poggio, Tomaso
Format:	Technical Report
Published:	Center for Brains, Minds and Machines (CBMM) 2021
Online Access:	https://hdl.handle.net/1721.1/129744

_version_	1811081815309418496
author	Banburski, Andrzej De La Torre, Fernanda Plant, Nishka Shastri, Ishana Poggio, Tomaso
author_facet	Banburski, Andrzej De La Torre, Fernanda Plant, Nishka Shastri, Ishana Poggio, Tomaso
author_sort	Banburski, Andrzej
collection	MIT
description	Recent theoretical results show that gradient descent on deep neural networks under exponential loss functions locally maximizes classification margin, which is equivalent to minimizing the norm of the weight matrices under margin constraints. This property of the solution however does not fully characterize the generalization performance. We motivate theoretically and show empirically that the area under the curve of the margin distribution on the training set is in fact a good measure of generalization. We then show that, after data separation is achieved, it is possible to dynamically reduce the training set by more than 99% without significant loss of performance. Interestingly, the resulting subset of “high capacity” features is not consistent across different training runs, which is consistent with the theoretical claim that all training points should converge to the same asymptotic margin under SGD and in the presence of both batch normalization and weight decay.
first_indexed	2024-09-23T11:52:54Z
format	Technical Report
id	mit-1721.1/129744
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T11:52:54Z
publishDate	2021
publisher	Center for Brains, Minds and Machines (CBMM)
record_format	dspace
spelling	mit-1721.1/1297442021-02-12T03:21:17Z Cross-validation Stability of Deep Networks Banburski, Andrzej De La Torre, Fernanda Plant, Nishka Shastri, Ishana Poggio, Tomaso Recent theoretical results show that gradient descent on deep neural networks under exponential loss functions locally maximizes classification margin, which is equivalent to minimizing the norm of the weight matrices under margin constraints. This property of the solution however does not fully characterize the generalization performance. We motivate theoretically and show empirically that the area under the curve of the margin distribution on the training set is in fact a good measure of generalization. We then show that, after data separation is achieved, it is possible to dynamically reduce the training set by more than 99% without significant loss of performance. Interestingly, the resulting subset of “high capacity” features is not consistent across different training runs, which is consistent with the theoretical claim that all training points should converge to the same asymptotic margin under SGD and in the presence of both batch normalization and weight decay. This material is based upon work supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216. 2021-02-11T16:59:01Z 2021-02-11T16:59:01Z 2021-02-09 Technical Report Working Paper Other https://hdl.handle.net/1721.1/129744 CBMM Memo;115 application/pdf Center for Brains, Minds and Machines (CBMM)
spellingShingle	Banburski, Andrzej De La Torre, Fernanda Plant, Nishka Shastri, Ishana Poggio, Tomaso Cross-validation Stability of Deep Networks
title	Cross-validation Stability of Deep Networks
title_full	Cross-validation Stability of Deep Networks
title_fullStr	Cross-validation Stability of Deep Networks
title_full_unstemmed	Cross-validation Stability of Deep Networks
title_short	Cross-validation Stability of Deep Networks
title_sort	cross validation stability of deep networks
url	https://hdl.handle.net/1721.1/129744
work_keys_str_mv	AT banburskiandrzej crossvalidationstabilityofdeepnetworks AT delatorrefernanda crossvalidationstabilityofdeepnetworks AT plantnishka crossvalidationstabilityofdeepnetworks AT shastriishana crossvalidationstabilityofdeepnetworks AT poggiotomaso crossvalidationstabilityofdeepnetworks

Cross-validation Stability of Deep Networks

Similar Items