Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization

We consider the problem of accurate quantization for language models, where both the weights and activations are quantized to 4 bits per parameter with uniform quantization, the lowest bitwidth format natively supported by existing GPU hardware. In this context, the key challenge is activation quant...

Full description

Bibliographic Details
Main Author: Nrusimha, Aniruddha
Other Authors: Kim, Yoon
Format: Thesis
Published: Massachusetts Institute of Technology 2024
Online Access:https://hdl.handle.net/1721.1/156280