An Empirical and Theoretical Analysis of the Role of Depth in Convolutional Neural Networks

While over-parameterized neural networks are capable of perfectly fitting (interpolating) training data, these networks often perform well on test data, thereby contradicting classical learning theory. Recent work provided an explanation for this phenomenon by introducing the double descent curve, s...

Full description

Bibliographic Details
Main Author: Nichani, Eshaan
Other Authors: Uhler, Caroline
Format: Thesis
Published: Massachusetts Institute of Technology 2022
Online Access:https://hdl.handle.net/1721.1/139174
Description
Summary:While over-parameterized neural networks are capable of perfectly fitting (interpolating) training data, these networks often perform well on test data, thereby contradicting classical learning theory. Recent work provided an explanation for this phenomenon by introducing the double descent curve, showing that increasing model capacity past the interpolation threshold can lead to a decrease in test error. In line with this, it was recently shown empirically and theoretically that increasing neural network capacity through width leads to double descent. In this thesis, we analyze the effect of increasing depth on test performance. In contrast to what is observed for increasing width, we demonstrate through a variety of classification experiments on CIFAR10 and ImageNet32 using fully-convolutional nets, ResNets and the convolutional neural tangent kernel (CNTK) that test performance is U-shaped and in fact worsens beyond a critical depth. To better understand this phenomenon, we conduct a theoretical analysis on the impact of depth on generalization in linear convolutional networks of infinite width. In particular, we derive the feature map for the linear CNTK for arbitrary depths and identify the depth which minimizes the bias and variance terms of the excess risk. The findings of this thesis imply that increasing depth for interpolating convolutional networks can in fact lead to worse generalization.