Transformer Pruning Relation and General Neural Network Augmentation

In this thesis, a method of initializing neural networks with weights transferred from smaller trained neural network weights was investigated. We name this process augmentation and present a few versions of it, some of which involve pruning. Firstly, the pruning relation of testing loss against den...

Full description

Bibliographic Details
Main Author: Lim, Yong Hui
Other Authors: Shavit, Nir
Format: Thesis
Published: Massachusetts Institute of Technology 2022
Online Access:https://hdl.handle.net/1721.1/139547
_version_ 1826207867730919424
author Lim, Yong Hui
author2 Shavit, Nir
author_facet Shavit, Nir
Lim, Yong Hui
author_sort Lim, Yong Hui
collection MIT
description In this thesis, a method of initializing neural networks with weights transferred from smaller trained neural network weights was investigated. We name this process augmentation and present a few versions of it, some of which involve pruning. Firstly, the pruning relation of testing loss against density was found for the GPT-2 transformer network on a causal language modeling task. An interesting double plateau of testing loss was found whenever the attention weights were pruned. Next, augmentation on low dimensional datasets and shallow networks was investigated. We found that performing a step of zeroing final layer initializations (ZFLI) results in better augmentation. With this insight, we proceeded to investigate a variety of datasets and networks. Two forms of augmentation were investigated: basic augmentation and pruned augmentation. However, both forms of augmentation were found to not produce any consistent improvement in testing accuracy/loss.
first_indexed 2024-09-23T13:56:13Z
format Thesis
id mit-1721.1/139547
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T13:56:13Z
publishDate 2022
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1395472022-01-15T03:23:26Z Transformer Pruning Relation and General Neural Network Augmentation Lim, Yong Hui Shavit, Nir Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science In this thesis, a method of initializing neural networks with weights transferred from smaller trained neural network weights was investigated. We name this process augmentation and present a few versions of it, some of which involve pruning. Firstly, the pruning relation of testing loss against density was found for the GPT-2 transformer network on a causal language modeling task. An interesting double plateau of testing loss was found whenever the attention weights were pruned. Next, augmentation on low dimensional datasets and shallow networks was investigated. We found that performing a step of zeroing final layer initializations (ZFLI) results in better augmentation. With this insight, we proceeded to investigate a variety of datasets and networks. Two forms of augmentation were investigated: basic augmentation and pruned augmentation. However, both forms of augmentation were found to not produce any consistent improvement in testing accuracy/loss. M.Eng. 2022-01-14T15:19:02Z 2022-01-14T15:19:02Z 2021-06 2021-06-17T20:13:36.140Z Thesis https://hdl.handle.net/1721.1/139547 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Lim, Yong Hui
Transformer Pruning Relation and General Neural Network Augmentation
title Transformer Pruning Relation and General Neural Network Augmentation
title_full Transformer Pruning Relation and General Neural Network Augmentation
title_fullStr Transformer Pruning Relation and General Neural Network Augmentation
title_full_unstemmed Transformer Pruning Relation and General Neural Network Augmentation
title_short Transformer Pruning Relation and General Neural Network Augmentation
title_sort transformer pruning relation and general neural network augmentation
url https://hdl.handle.net/1721.1/139547
work_keys_str_mv AT limyonghui transformerpruningrelationandgeneralneuralnetworkaugmentation