HyperBlock floating point: generalised quantization scheme for gradient and inference computation
Prior quantization methods focus on producing networks for fast and lightweight inference. However, the cost of unquantised training is overlooked, despite requiring significantly more time and energy than inference. We present a method for quantizing convolutional neural networks for efficient trai...
Main Authors: | , , , |
---|---|
Formato: | Conference item |
Idioma: | English |
Publicado em: |
IEEE
2023
|
_version_ | 1826309676921257984 |
---|---|
author | Gennari do Nascimento, M Adrian Prisacariu, V Fawcett, R Langhammer, M |
author_facet | Gennari do Nascimento, M Adrian Prisacariu, V Fawcett, R Langhammer, M |
author_sort | Gennari do Nascimento, M |
collection | OXFORD |
description | Prior quantization methods focus on producing networks for fast and lightweight inference. However, the cost of unquantised training is overlooked, despite requiring significantly more time and energy than inference. We present a method for quantizing convolutional neural networks for efficient training. Quantizing gradients is challenging because it requires higher granularity and their values span a wider range than the weight and feature maps. We propose an extension of the Channel-wise Block Floating Point format that allows for quick gradient computation, using a minimal amount of quantization time. This is achieved through sharing an exponent across both depth and batch dimensions in order to quantize tensors once and reuse them during backpropagation. We test our method using standard models such as AlexNet, VGG, and ResNet, on the CIFAR10, SVHN and ImageNet datasets. We show no loss of accuracy when quantizing AlexNet weights, activations and gradients to only 4 bits training ImageNet. |
first_indexed | 2024-03-07T07:39:18Z |
format | Conference item |
id | oxford-uuid:60e5a93b-c3ab-48e1-95e8-42671e0c1698 |
institution | University of Oxford |
language | English |
last_indexed | 2024-03-07T07:39:18Z |
publishDate | 2023 |
publisher | IEEE |
record_format | dspace |
spelling | oxford-uuid:60e5a93b-c3ab-48e1-95e8-42671e0c16982023-04-05T12:27:43ZHyperBlock floating point: generalised quantization scheme for gradient and inference computationConference itemhttp://purl.org/coar/resource_type/c_5794uuid:60e5a93b-c3ab-48e1-95e8-42671e0c1698EnglishSymplectic ElementsIEEE2023Gennari do Nascimento, MAdrian Prisacariu, VFawcett, RLanghammer, MPrior quantization methods focus on producing networks for fast and lightweight inference. However, the cost of unquantised training is overlooked, despite requiring significantly more time and energy than inference. We present a method for quantizing convolutional neural networks for efficient training. Quantizing gradients is challenging because it requires higher granularity and their values span a wider range than the weight and feature maps. We propose an extension of the Channel-wise Block Floating Point format that allows for quick gradient computation, using a minimal amount of quantization time. This is achieved through sharing an exponent across both depth and batch dimensions in order to quantize tensors once and reuse them during backpropagation. We test our method using standard models such as AlexNet, VGG, and ResNet, on the CIFAR10, SVHN and ImageNet datasets. We show no loss of accuracy when quantizing AlexNet weights, activations and gradients to only 4 bits training ImageNet. |
spellingShingle | Gennari do Nascimento, M Adrian Prisacariu, V Fawcett, R Langhammer, M HyperBlock floating point: generalised quantization scheme for gradient and inference computation |
title | HyperBlock floating point: generalised quantization scheme for gradient and inference computation |
title_full | HyperBlock floating point: generalised quantization scheme for gradient and inference computation |
title_fullStr | HyperBlock floating point: generalised quantization scheme for gradient and inference computation |
title_full_unstemmed | HyperBlock floating point: generalised quantization scheme for gradient and inference computation |
title_short | HyperBlock floating point: generalised quantization scheme for gradient and inference computation |
title_sort | hyperblock floating point generalised quantization scheme for gradient and inference computation |
work_keys_str_mv | AT gennaridonascimentom hyperblockfloatingpointgeneralisedquantizationschemeforgradientandinferencecomputation AT adrianprisacariuv hyperblockfloatingpointgeneralisedquantizationschemeforgradientandinferencecomputation AT fawcettr hyperblockfloatingpointgeneralisedquantizationschemeforgradientandinferencecomputation AT langhammerm hyperblockfloatingpointgeneralisedquantizationschemeforgradientandinferencecomputation |