Transformer Pruning Relation and General Neural Network Augmentation

In this thesis, a method of initializing neural networks with weights transferred from smaller trained neural network weights was investigated. We name this process augmentation and present a few versions of it, some of which involve pruning. Firstly, the pruning relation of testing loss against den...

Full description

Bibliographic Details
Main Author:	Lim, Yong Hui
Other Authors:	Shavit, Nir
Format:	Thesis
Published:	Massachusetts Institute of Technology 2022
Online Access:	https://hdl.handle.net/1721.1/139547

_version_	1826207867730919424
author	Lim, Yong Hui
author2	Shavit, Nir
author_facet	Shavit, Nir Lim, Yong Hui
author_sort	Lim, Yong Hui
collection	MIT
description	In this thesis, a method of initializing neural networks with weights transferred from smaller trained neural network weights was investigated. We name this process augmentation and present a few versions of it, some of which involve pruning. Firstly, the pruning relation of testing loss against density was found for the GPT-2 transformer network on a causal language modeling task. An interesting double plateau of testing loss was found whenever the attention weights were pruned. Next, augmentation on low dimensional datasets and shallow networks was investigated. We found that performing a step of zeroing final layer initializations (ZFLI) results in better augmentation. With this insight, we proceeded to investigate a variety of datasets and networks. Two forms of augmentation were investigated: basic augmentation and pruned augmentation. However, both forms of augmentation were found to not produce any consistent improvement in testing accuracy/loss.
first_indexed	2024-09-23T13:56:13Z
format	Thesis
id	mit-1721.1/139547
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T13:56:13Z
publishDate	2022
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1395472022-01-15T03:23:26Z Transformer Pruning Relation and General Neural Network Augmentation Lim, Yong Hui Shavit, Nir Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science In this thesis, a method of initializing neural networks with weights transferred from smaller trained neural network weights was investigated. We name this process augmentation and present a few versions of it, some of which involve pruning. Firstly, the pruning relation of testing loss against density was found for the GPT-2 transformer network on a causal language modeling task. An interesting double plateau of testing loss was found whenever the attention weights were pruned. Next, augmentation on low dimensional datasets and shallow networks was investigated. We found that performing a step of zeroing final layer initializations (ZFLI) results in better augmentation. With this insight, we proceeded to investigate a variety of datasets and networks. Two forms of augmentation were investigated: basic augmentation and pruned augmentation. However, both forms of augmentation were found to not produce any consistent improvement in testing accuracy/loss. M.Eng. 2022-01-14T15:19:02Z 2022-01-14T15:19:02Z 2021-06 2021-06-17T20:13:36.140Z Thesis https://hdl.handle.net/1721.1/139547 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Lim, Yong Hui Transformer Pruning Relation and General Neural Network Augmentation
title	Transformer Pruning Relation and General Neural Network Augmentation
title_full	Transformer Pruning Relation and General Neural Network Augmentation
title_fullStr	Transformer Pruning Relation and General Neural Network Augmentation
title_full_unstemmed	Transformer Pruning Relation and General Neural Network Augmentation
title_short	Transformer Pruning Relation and General Neural Network Augmentation
title_sort	transformer pruning relation and general neural network augmentation
url	https://hdl.handle.net/1721.1/139547
work_keys_str_mv	AT limyonghui transformerpruningrelationandgeneralneuralnetworkaugmentation

Transformer Pruning Relation and General Neural Network Augmentation

Similar Items