Predicting Parameters in Deep Learning

We demonstrate that there is significant redundancy in the parameterization of several deep learning models. Given only a few weight values for each feature it is possible to accurately predict the remaining values. Moreover, we show that not only can the parameter values be predicted, but many of t...

Full description

Bibliographic Details
Main Authors: Denil, M, Shakibi, B, Dinh, L, Ranzato, M, de Freitas, N
Format: Conference item
Published: 2013
_version_ 1797058700334071808
author Denil, M
Shakibi, B
Dinh, L
Ranzato, M
de Freitas, N
author_facet Denil, M
Shakibi, B
Dinh, L
Ranzato, M
de Freitas, N
author_sort Denil, M
collection OXFORD
description We demonstrate that there is significant redundancy in the parameterization of several deep learning models. Given only a few weight values for each feature it is possible to accurately predict the remaining values. Moreover, we show that not only can the parameter values be predicted, but many of them need not be learned at all. We train several different architectures by learning only a small number of weights and predicting the rest. In the best case we are able to predict more than 95% of the weights of a network without any drop in accuracy.
first_indexed 2024-03-06T19:54:00Z
format Conference item
id oxford-uuid:24eb6b3a-d833-4f13-93fb-12277843891b
institution University of Oxford
last_indexed 2024-03-06T19:54:00Z
publishDate 2013
record_format dspace
spelling oxford-uuid:24eb6b3a-d833-4f13-93fb-12277843891b2022-03-26T11:52:55ZPredicting Parameters in Deep LearningConference itemhttp://purl.org/coar/resource_type/c_5794uuid:24eb6b3a-d833-4f13-93fb-12277843891bDepartment of Computer Science2013Denil, MShakibi, BDinh, LRanzato, Mde Freitas, NWe demonstrate that there is significant redundancy in the parameterization of several deep learning models. Given only a few weight values for each feature it is possible to accurately predict the remaining values. Moreover, we show that not only can the parameter values be predicted, but many of them need not be learned at all. We train several different architectures by learning only a small number of weights and predicting the rest. In the best case we are able to predict more than 95% of the weights of a network without any drop in accuracy.
spellingShingle Denil, M
Shakibi, B
Dinh, L
Ranzato, M
de Freitas, N
Predicting Parameters in Deep Learning
title Predicting Parameters in Deep Learning
title_full Predicting Parameters in Deep Learning
title_fullStr Predicting Parameters in Deep Learning
title_full_unstemmed Predicting Parameters in Deep Learning
title_short Predicting Parameters in Deep Learning
title_sort predicting parameters in deep learning
work_keys_str_mv AT denilm predictingparametersindeeplearning
AT shakibib predictingparametersindeeplearning
AT dinhl predictingparametersindeeplearning
AT ranzatom predictingparametersindeeplearning
AT defreitasn predictingparametersindeeplearning