Evaluating Adaptive Layer Freezing through Hyperparameter Optimization for Enhanced Fine-Tuning Performance of Language Models
Language models are initially trained on large datasets, enabling them to extract patterns and establish rich contextual connections. When dealing with data scarcity, transfer learning has become the go-to method to use these models in specialized downstream tasks via fine-tuning. However, fine-tuni...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2024
|
Online Access: | https://hdl.handle.net/1721.1/157169 |
_version_ | 1824458046219223040 |
---|---|
author | Figueroa, Reinaldo |
author2 | Murray, Fiona |
author_facet | Murray, Fiona Figueroa, Reinaldo |
author_sort | Figueroa, Reinaldo |
collection | MIT |
description | Language models are initially trained on large datasets, enabling them to extract patterns and establish rich contextual connections. When dealing with data scarcity, transfer learning has become the go-to method to use these models in specialized downstream tasks via fine-tuning. However, fine-tuning on small datasets can lead to overfitting and a lack of generalization. Generalization is crucial when deploying models that perform a sensitive tasks in a real world environment, as it dictates how well it performs on unseen data. Conversely, overfitting is highly likely to occur when training on small datasets. This thesis proposes and evaluates a new method for fine-tuning language models by adaptively choosing specific learning rates for each transformer layer that provide higher performance on in-domain low-volume datasets. Additionally, we explore which layers inside the models usually hold more contextual information from pre-training that might be valuable to keep ‘frozen’ when fine-tuning on small datasets. This analysis provides insights into fine-tuning approaches during initial experiments when data is limited. Our results demonstrate limited performance gains on certain models while achieving more significant gains on others when fine-tuning using our proposed method. Additionally, our work also provides valuable insight into per-layer importance of language models by showing that certain layers have a stronger direct correlation with the overall model accuracy. |
first_indexed | 2025-02-19T04:19:40Z |
format | Thesis |
id | mit-1721.1/157169 |
institution | Massachusetts Institute of Technology |
last_indexed | 2025-02-19T04:19:40Z |
publishDate | 2024 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1571692024-10-10T03:48:31Z Evaluating Adaptive Layer Freezing through Hyperparameter Optimization for Enhanced Fine-Tuning Performance of Language Models Figueroa, Reinaldo Murray, Fiona Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Language models are initially trained on large datasets, enabling them to extract patterns and establish rich contextual connections. When dealing with data scarcity, transfer learning has become the go-to method to use these models in specialized downstream tasks via fine-tuning. However, fine-tuning on small datasets can lead to overfitting and a lack of generalization. Generalization is crucial when deploying models that perform a sensitive tasks in a real world environment, as it dictates how well it performs on unseen data. Conversely, overfitting is highly likely to occur when training on small datasets. This thesis proposes and evaluates a new method for fine-tuning language models by adaptively choosing specific learning rates for each transformer layer that provide higher performance on in-domain low-volume datasets. Additionally, we explore which layers inside the models usually hold more contextual information from pre-training that might be valuable to keep ‘frozen’ when fine-tuning on small datasets. This analysis provides insights into fine-tuning approaches during initial experiments when data is limited. Our results demonstrate limited performance gains on certain models while achieving more significant gains on others when fine-tuning using our proposed method. Additionally, our work also provides valuable insight into per-layer importance of language models by showing that certain layers have a stronger direct correlation with the overall model accuracy. M.Eng. 2024-10-09T18:25:53Z 2024-10-09T18:25:53Z 2024-09 2024-10-07T14:34:33.899Z Thesis https://hdl.handle.net/1721.1/157169 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Figueroa, Reinaldo Evaluating Adaptive Layer Freezing through Hyperparameter Optimization for Enhanced Fine-Tuning Performance of Language Models |
title | Evaluating Adaptive Layer Freezing through Hyperparameter Optimization for Enhanced Fine-Tuning Performance of Language Models |
title_full | Evaluating Adaptive Layer Freezing through Hyperparameter Optimization for Enhanced Fine-Tuning Performance of Language Models |
title_fullStr | Evaluating Adaptive Layer Freezing through Hyperparameter Optimization for Enhanced Fine-Tuning Performance of Language Models |
title_full_unstemmed | Evaluating Adaptive Layer Freezing through Hyperparameter Optimization for Enhanced Fine-Tuning Performance of Language Models |
title_short | Evaluating Adaptive Layer Freezing through Hyperparameter Optimization for Enhanced Fine-Tuning Performance of Language Models |
title_sort | evaluating adaptive layer freezing through hyperparameter optimization for enhanced fine tuning performance of language models |
url | https://hdl.handle.net/1721.1/157169 |
work_keys_str_mv | AT figueroareinaldo evaluatingadaptivelayerfreezingthroughhyperparameteroptimizationforenhancedfinetuningperformanceoflanguagemodels |