Predicting Parameters in Deep Learning

We demonstrate that there is significant redundancy in the parameterization of several deep learning models. Given only a few weight values for each feature it is possible to accurately predict the remaining values. Moreover, we show that not only can the parameter values be predicted, but many of t...

Full description

Bibliographic Details
Main Authors:	Denil, M, Shakibi, B, Dinh, L, Ranzato, M, de Freitas, N
Format:	Conference item
Published:	2013

_version_	1826263577454968832
author	Denil, M Shakibi, B Dinh, L Ranzato, M de Freitas, N
author_facet	Denil, M Shakibi, B Dinh, L Ranzato, M de Freitas, N
author_sort	Denil, M
collection	OXFORD
description	We demonstrate that there is significant redundancy in the parameterization of several deep learning models. Given only a few weight values for each feature it is possible to accurately predict the remaining values. Moreover, we show that not only can the parameter values be predicted, but many of them need not be learned at all. We train several different architectures by learning only a small number of weights and predicting the rest. In the best case we are able to predict more than 95% of the weights of a network without any drop in accuracy.
first_indexed	2024-03-06T19:54:00Z
format	Conference item
id	oxford-uuid:24eb6b3a-d833-4f13-93fb-12277843891b
institution	University of Oxford
last_indexed	2024-03-06T19:54:00Z
publishDate	2013
record_format	dspace
spelling	oxford-uuid:24eb6b3a-d833-4f13-93fb-12277843891b2022-03-26T11:52:55ZPredicting Parameters in Deep LearningConference itemhttp://purl.org/coar/resource_type/c_5794uuid:24eb6b3a-d833-4f13-93fb-12277843891bDepartment of Computer Science2013Denil, MShakibi, BDinh, LRanzato, Mde Freitas, NWe demonstrate that there is significant redundancy in the parameterization of several deep learning models. Given only a few weight values for each feature it is possible to accurately predict the remaining values. Moreover, we show that not only can the parameter values be predicted, but many of them need not be learned at all. We train several different architectures by learning only a small number of weights and predicting the rest. In the best case we are able to predict more than 95% of the weights of a network without any drop in accuracy.
spellingShingle	Denil, M Shakibi, B Dinh, L Ranzato, M de Freitas, N Predicting Parameters in Deep Learning
title	Predicting Parameters in Deep Learning
title_full	Predicting Parameters in Deep Learning
title_fullStr	Predicting Parameters in Deep Learning
title_full_unstemmed	Predicting Parameters in Deep Learning
title_short	Predicting Parameters in Deep Learning
title_sort	predicting parameters in deep learning
work_keys_str_mv	AT denilm predictingparametersindeeplearning AT shakibib predictingparametersindeeplearning AT dinhl predictingparametersindeeplearning AT ranzatom predictingparametersindeeplearning AT defreitasn predictingparametersindeeplearning

Predicting Parameters in Deep Learning

Similar Items