Transformer Pruning Relation and General Neural Network Augmentation
In this thesis, a method of initializing neural networks with weights transferred from smaller trained neural network weights was investigated. We name this process augmentation and present a few versions of it, some of which involve pruning. Firstly, the pruning relation of testing loss against den...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2022
|
Online Access: | https://hdl.handle.net/1721.1/139547 |
_version_ | 1826207867730919424 |
---|---|
author | Lim, Yong Hui |
author2 | Shavit, Nir |
author_facet | Shavit, Nir Lim, Yong Hui |
author_sort | Lim, Yong Hui |
collection | MIT |
description | In this thesis, a method of initializing neural networks with weights transferred from smaller trained neural network weights was investigated. We name this process augmentation and present a few versions of it, some of which involve pruning. Firstly, the pruning relation of testing loss against density was found for the GPT-2 transformer network on a causal language modeling task. An interesting double plateau of testing loss was found whenever the attention weights were pruned. Next, augmentation on low dimensional datasets and shallow networks was investigated. We found that performing a step of zeroing final layer initializations (ZFLI) results in better augmentation. With this insight, we proceeded to investigate a variety of datasets and networks. Two forms of augmentation were investigated: basic augmentation and pruned augmentation. However, both forms of augmentation were found to not produce any consistent improvement in testing accuracy/loss. |
first_indexed | 2024-09-23T13:56:13Z |
format | Thesis |
id | mit-1721.1/139547 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T13:56:13Z |
publishDate | 2022 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1395472022-01-15T03:23:26Z Transformer Pruning Relation and General Neural Network Augmentation Lim, Yong Hui Shavit, Nir Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science In this thesis, a method of initializing neural networks with weights transferred from smaller trained neural network weights was investigated. We name this process augmentation and present a few versions of it, some of which involve pruning. Firstly, the pruning relation of testing loss against density was found for the GPT-2 transformer network on a causal language modeling task. An interesting double plateau of testing loss was found whenever the attention weights were pruned. Next, augmentation on low dimensional datasets and shallow networks was investigated. We found that performing a step of zeroing final layer initializations (ZFLI) results in better augmentation. With this insight, we proceeded to investigate a variety of datasets and networks. Two forms of augmentation were investigated: basic augmentation and pruned augmentation. However, both forms of augmentation were found to not produce any consistent improvement in testing accuracy/loss. M.Eng. 2022-01-14T15:19:02Z 2022-01-14T15:19:02Z 2021-06 2021-06-17T20:13:36.140Z Thesis https://hdl.handle.net/1721.1/139547 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Lim, Yong Hui Transformer Pruning Relation and General Neural Network Augmentation |
title | Transformer Pruning Relation and General Neural Network Augmentation |
title_full | Transformer Pruning Relation and General Neural Network Augmentation |
title_fullStr | Transformer Pruning Relation and General Neural Network Augmentation |
title_full_unstemmed | Transformer Pruning Relation and General Neural Network Augmentation |
title_short | Transformer Pruning Relation and General Neural Network Augmentation |
title_sort | transformer pruning relation and general neural network augmentation |
url | https://hdl.handle.net/1721.1/139547 |
work_keys_str_mv | AT limyonghui transformerpruningrelationandgeneralneuralnetworkaugmentation |