On Neural Network Pruning’s Effect on Generalization

Practitioners frequently observe that pruning improves model generalization. A longstanding hypothesis attributes such improvement to model size reduction. However, recent studies on over-parameterization characterize a new model size regime, in which larger models achieve better generalization. A c...

Full description

Bibliographic Details
Main Author: Jin, Tian
Other Authors: Carbin, Michael
Format: Thesis
Published: Massachusetts Institute of Technology 2023
Online Access:https://hdl.handle.net/1721.1/147496
Description
Summary:Practitioners frequently observe that pruning improves model generalization. A longstanding hypothesis attributes such improvement to model size reduction. However, recent studies on over-parameterization characterize a new model size regime, in which larger models achieve better generalization. A contradiction arises when pruning is applied to over-parameterized models – while theory predicts that reducing size harms generalization, pruning nonetheless improves it. Motivated by such a contradiction, I re-examine pruning’s effect on generalization empirically. I demonstrate that pruning’s generalization-improving effect cannot be fully accounted for by weight removal. Instead, I find that pruning can lead to better training, improving model training loss. I find that pruning can also lead to stronger regularization, mitigating the harmful effect of noisy examples. Pruning extends model training time and reduces model size, which improves training and strengthens regularization respectively. I empirically demonstrate that both factors are essential to explaining pruning’s benefits to generalization fully.