Deep vs. shallow networks : An approximation theory perspective

The paper briefly reviews several recent results on hierarchical architectures for learning from examples, that may formally explain the conditions under which Deep Convolutional Neural Networks perform much better in function approximation problems than shallow, one-hidden layer architectures. The...

Full description

Bibliographic Details
Main Authors: Mhaskar, Hrushikesh, Poggio, Tomaso
Format: Technical Report
Language:en_US
Published: Center for Brains, Minds and Machines (CBMM), arXiv 2016
Subjects:
Online Access:http://hdl.handle.net/1721.1/103911
_version_ 1826216892753248256
author Mhaskar, Hrushikesh
Poggio, Tomaso
author_facet Mhaskar, Hrushikesh
Poggio, Tomaso
author_sort Mhaskar, Hrushikesh
collection MIT
description The paper briefly reviews several recent results on hierarchical architectures for learning from examples, that may formally explain the conditions under which Deep Convolutional Neural Networks perform much better in function approximation problems than shallow, one-hidden layer architectures. The paper announces new results for a non-smooth activation function – the ReLU function – used in present-day neural networks, as well as for the Gaussian networks. We propose a new definition of relative dimension to encapsulate different notions of sparsity of a function class that can possibly be exploited by deep networks but not by shallow ones to drastically reduce the complexity required for approximation and learning.
first_indexed 2024-09-23T16:54:47Z
format Technical Report
id mit-1721.1/103911
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T16:54:47Z
publishDate 2016
publisher Center for Brains, Minds and Machines (CBMM), arXiv
record_format dspace
spelling mit-1721.1/1039112019-04-08T08:13:33Z Deep vs. shallow networks : An approximation theory perspective Mhaskar, Hrushikesh Poggio, Tomaso hierarchical architectures Deep Convolutional Neural Networks ReLU function Gaussian networks The paper briefly reviews several recent results on hierarchical architectures for learning from examples, that may formally explain the conditions under which Deep Convolutional Neural Networks perform much better in function approximation problems than shallow, one-hidden layer architectures. The paper announces new results for a non-smooth activation function – the ReLU function – used in present-day neural networks, as well as for the Gaussian networks. We propose a new definition of relative dimension to encapsulate different notions of sparsity of a function class that can possibly be exploited by deep networks but not by shallow ones to drastically reduce the complexity required for approximation and learning. This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF – 1231216. 2016-08-12T22:44:41Z 2016-08-12T22:44:41Z 2016-08-12 Technical Report Working Paper Other http://hdl.handle.net/1721.1/103911 arXiv:1608.03287 en_US CBMM Memo Series;054 Attribution-NonCommercial-ShareAlike 3.0 United States http://creativecommons.org/licenses/by-nc-sa/3.0/us/ Center for Brains, Minds and Machines (CBMM), arXiv
spellingShingle hierarchical architectures
Deep Convolutional Neural Networks
ReLU function
Gaussian networks
Mhaskar, Hrushikesh
Poggio, Tomaso
Deep vs. shallow networks : An approximation theory perspective
title Deep vs. shallow networks : An approximation theory perspective
title_full Deep vs. shallow networks : An approximation theory perspective
title_fullStr Deep vs. shallow networks : An approximation theory perspective
title_full_unstemmed Deep vs. shallow networks : An approximation theory perspective
title_short Deep vs. shallow networks : An approximation theory perspective
title_sort deep vs shallow networks an approximation theory perspective
topic hierarchical architectures
Deep Convolutional Neural Networks
ReLU function
Gaussian networks
url http://hdl.handle.net/1721.1/103911
work_keys_str_mv AT mhaskarhrushikesh deepvsshallownetworksanapproximationtheoryperspective
AT poggiotomaso deepvsshallownetworksanapproximationtheoryperspective