I-theory on depth vs width: hierarchical function composition

Deep learning networks with convolution, pooling and subsampling are a special case of hierar- chical architectures, which can be represented by trees (such as binary trees). Hierarchical as well as shallow networks can approximate functions of several variables, in particular those that are com- po...

Full description

Bibliographic Details
Main Authors:	Poggio, Tomaso, Anselmi, Fabio, Rosasco, Lorenzo
Format:	Technical Report
Language:	en_US
Published:	Center for Brains, Minds and Machines (CBMM) 2015
Subjects:	Deep Convolutional Learning Networks (DCLNs) Hierarchy i-theory
Online Access:	http://hdl.handle.net/1721.1/100559

_version_	1826196100330029056
author	Poggio, Tomaso Anselmi, Fabio Rosasco, Lorenzo
author_facet	Poggio, Tomaso Anselmi, Fabio Rosasco, Lorenzo
author_sort	Poggio, Tomaso
collection	MIT
description	Deep learning networks with convolution, pooling and subsampling are a special case of hierar- chical architectures, which can be represented by trees (such as binary trees). Hierarchical as well as shallow networks can approximate functions of several variables, in particular those that are com- positions of low dimensional functions. We show that the power of a deep network architecture with respect to a shallow network is rather independent of the specific nonlinear operations in the network and depends instead on the the behavior of the VC-dimension. A shallow network can approximate compositional functions with the same error of a deep network but at the cost of a VC-dimension that is exponential instead than quadratic in the dimensionality of the function. To complete the argument we argue that there exist visual computations that are intrinsically compositional. In particular, we prove that recognition invariant to translation cannot be computed by shallow networks in the presence of clutter. Finally, a general framework that includes the compositional case is sketched. The key con- dition that allows tall, thin networks to be nicer that short, fat networks is that the target input-output function must be sparse in a certain technical sense.
first_indexed	2024-09-23T10:21:10Z
format	Technical Report
id	mit-1721.1/100559
institution	Massachusetts Institute of Technology
language	en_US
last_indexed	2024-09-23T10:21:10Z
publishDate	2015
publisher	Center for Brains, Minds and Machines (CBMM)
record_format	dspace
spelling	mit-1721.1/1005592019-04-12T20:18:47Z I-theory on depth vs width: hierarchical function composition Poggio, Tomaso Anselmi, Fabio Rosasco, Lorenzo Deep Convolutional Learning Networks (DCLNs) Hierarchy i-theory Deep learning networks with convolution, pooling and subsampling are a special case of hierar- chical architectures, which can be represented by trees (such as binary trees). Hierarchical as well as shallow networks can approximate functions of several variables, in particular those that are com- positions of low dimensional functions. We show that the power of a deep network architecture with respect to a shallow network is rather independent of the specific nonlinear operations in the network and depends instead on the the behavior of the VC-dimension. A shallow network can approximate compositional functions with the same error of a deep network but at the cost of a VC-dimension that is exponential instead than quadratic in the dimensionality of the function. To complete the argument we argue that there exist visual computations that are intrinsically compositional. In particular, we prove that recognition invariant to translation cannot be computed by shallow networks in the presence of clutter. Finally, a general framework that includes the compositional case is sketched. The key con- dition that allows tall, thin networks to be nicer that short, fat networks is that the target input-output function must be sparse in a certain technical sense. This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF - 1231216. 2015-12-30T02:37:36Z 2015-12-30T02:37:36Z 2015-12-29 Technical Report Working Paper Other http://hdl.handle.net/1721.1/100559 en_US CBMM Memo Series;041 Attribution-NonCommercial 3.0 United States http://creativecommons.org/licenses/by-nc/3.0/us/ application/pdf Center for Brains, Minds and Machines (CBMM)
spellingShingle	Deep Convolutional Learning Networks (DCLNs) Hierarchy i-theory Poggio, Tomaso Anselmi, Fabio Rosasco, Lorenzo I-theory on depth vs width: hierarchical function composition
title	I-theory on depth vs width: hierarchical function composition
title_full	I-theory on depth vs width: hierarchical function composition
title_fullStr	I-theory on depth vs width: hierarchical function composition
title_full_unstemmed	I-theory on depth vs width: hierarchical function composition
title_short	I-theory on depth vs width: hierarchical function composition
title_sort	i theory on depth vs width hierarchical function composition
topic	Deep Convolutional Learning Networks (DCLNs) Hierarchy i-theory
url	http://hdl.handle.net/1721.1/100559
work_keys_str_mv	AT poggiotomaso itheoryondepthvswidthhierarchicalfunctioncomposition AT anselmifabio itheoryondepthvswidthhierarchicalfunctioncomposition AT rosascolorenzo itheoryondepthvswidthhierarchicalfunctioncomposition

I-theory on depth vs width: hierarchical function composition

Similar Items