Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization
We consider the problem of accurate quantization for language models, where both the weights and activations are quantized to 4 bits per parameter with uniform quantization, the lowest bitwidth format natively supported by existing GPU hardware. In this context, the key challenge is activation quant...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2024
|
Online Access: | https://hdl.handle.net/1721.1/156280 |