Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization

We consider the problem of accurate quantization for language models, where both the weights and activations are quantized to 4 bits per parameter with uniform quantization, the lowest bitwidth format natively supported by existing GPU hardware. In this context, the key challenge is activation quant...

Full description

Bibliographic Details
Main Author:	Nrusimha, Aniruddha
Other Authors:	Kim, Yoon
Format:	Thesis
Published:	Massachusetts Institute of Technology 2024
Online Access:	https://hdl.handle.net/1721.1/156280

_version_	1826217528438816768
author	Nrusimha, Aniruddha
author2	Kim, Yoon
author_facet	Kim, Yoon Nrusimha, Aniruddha
author_sort	Nrusimha, Aniruddha
collection	MIT
description	We consider the problem of accurate quantization for language models, where both the weights and activations are quantized to 4 bits per parameter with uniform quantization, the lowest bitwidth format natively supported by existing GPU hardware. In this context, the key challenge is activation quantization: it is known that language models contain outlier channels whose values on average are orders of magnitude higher than than other channels, which prevents accurate low-bitwidth quantization with known techniques. We systematically study this phenomena and find that these outlier channels emerge early in training, and that they occur more frequently in layers with residual streams. We then propose a simple strategy which regularizes a layer’s inputs via quantization-aware training (QAT) and its outputs via activation kurtosis regularization. We show that regularizing both the inputs and outputs is crucial for preventing a model’s "migrating" the difficulty in input quantization to the weights, which makes post-training quantization (PTQ) of weights more difficult. When combined with weight PTQ, we show that our approach can obtain a W4A4 model with integer quantization that performs competitively to the standard-precision W16A16 baseline.1
first_indexed	2024-09-23T17:05:06Z
format	Thesis
id	mit-1721.1/156280
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T17:05:06Z
publishDate	2024
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1562802024-08-22T03:01:23Z Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization Nrusimha, Aniruddha Kim, Yoon Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science We consider the problem of accurate quantization for language models, where both the weights and activations are quantized to 4 bits per parameter with uniform quantization, the lowest bitwidth format natively supported by existing GPU hardware. In this context, the key challenge is activation quantization: it is known that language models contain outlier channels whose values on average are orders of magnitude higher than than other channels, which prevents accurate low-bitwidth quantization with known techniques. We systematically study this phenomena and find that these outlier channels emerge early in training, and that they occur more frequently in layers with residual streams. We then propose a simple strategy which regularizes a layer’s inputs via quantization-aware training (QAT) and its outputs via activation kurtosis regularization. We show that regularizing both the inputs and outputs is crucial for preventing a model’s "migrating" the difficulty in input quantization to the weights, which makes post-training quantization (PTQ) of weights more difficult. When combined with weight PTQ, we show that our approach can obtain a W4A4 model with integer quantization that performs competitively to the standard-precision W16A16 baseline.1 S.M. 2024-08-21T18:53:35Z 2024-08-21T18:53:35Z 2024-05 2024-07-10T12:59:47.470Z Thesis https://hdl.handle.net/1721.1/156280 Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Nrusimha, Aniruddha Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization
title	Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization
title_full	Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization
title_fullStr	Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization
title_full_unstemmed	Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization
title_short	Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization
title_sort	mitigating the impact of outlier channels for language model quantization with activation regularization
url	https://hdl.handle.net/1721.1/156280
work_keys_str_mv	AT nrusimhaaniruddha mitigatingtheimpactofoutlierchannelsforlanguagemodelquantizationwithactivationregularization

Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization

Similar Items