Summary: | <p>Over-parameterised modern neural networks owe their success to two fundamental properties: expressive power and generalisation capability. The former refers to the model's ability to fit a large variety of data sets, while the latter enables the network to extrapolate patterns from training examples and apply them to previously unseen data. This thesis addresses a few challenges related to these two key properties.</p>
<p>The fact that over-parameterised networks can fit any data set is not always indicative of their practical expressiveness. This is the object of the first part of this thesis, where we delve into how the input information can get lost when propagating through a deep architecture, and we propose as an easily implementable possible solution the introduction of suitable scaling factors and residual connections. </p>
<p>The second part of this thesis focuses on generalisation. The reason why modern neural networks can generalise well to new data without overfitting, despite being over-parameterised, is an open question that is currently receiving considerable attention in the research community. We explore this subject from information-theoretic and PAC-Bayesian viewpoints, proposing novel learning algorithms and generalisation bounds.</p>
|