HyperBlock floating point: generalised quantization scheme for gradient and inference computation

Prior quantization methods focus on producing networks for fast and lightweight inference. However, the cost of unquantised training is overlooked, despite requiring significantly more time and energy than inference. We present a method for quantizing convolutional neural networks for efficient trai...

ver descrição completa

Detalhes bibliográficos
Main Authors: Gennari do Nascimento, M, Adrian Prisacariu, V, Fawcett, R, Langhammer, M
Formato: Conference item
Idioma:English
Publicado em: IEEE 2023
_version_ 1826309676921257984
author Gennari do Nascimento, M
Adrian Prisacariu, V
Fawcett, R
Langhammer, M
author_facet Gennari do Nascimento, M
Adrian Prisacariu, V
Fawcett, R
Langhammer, M
author_sort Gennari do Nascimento, M
collection OXFORD
description Prior quantization methods focus on producing networks for fast and lightweight inference. However, the cost of unquantised training is overlooked, despite requiring significantly more time and energy than inference. We present a method for quantizing convolutional neural networks for efficient training. Quantizing gradients is challenging because it requires higher granularity and their values span a wider range than the weight and feature maps. We propose an extension of the Channel-wise Block Floating Point format that allows for quick gradient computation, using a minimal amount of quantization time. This is achieved through sharing an exponent across both depth and batch dimensions in order to quantize tensors once and reuse them during backpropagation. We test our method using standard models such as AlexNet, VGG, and ResNet, on the CIFAR10, SVHN and ImageNet datasets. We show no loss of accuracy when quantizing AlexNet weights, activations and gradients to only 4 bits training ImageNet.
first_indexed 2024-03-07T07:39:18Z
format Conference item
id oxford-uuid:60e5a93b-c3ab-48e1-95e8-42671e0c1698
institution University of Oxford
language English
last_indexed 2024-03-07T07:39:18Z
publishDate 2023
publisher IEEE
record_format dspace
spelling oxford-uuid:60e5a93b-c3ab-48e1-95e8-42671e0c16982023-04-05T12:27:43ZHyperBlock floating point: generalised quantization scheme for gradient and inference computationConference itemhttp://purl.org/coar/resource_type/c_5794uuid:60e5a93b-c3ab-48e1-95e8-42671e0c1698EnglishSymplectic ElementsIEEE2023Gennari do Nascimento, MAdrian Prisacariu, VFawcett, RLanghammer, MPrior quantization methods focus on producing networks for fast and lightweight inference. However, the cost of unquantised training is overlooked, despite requiring significantly more time and energy than inference. We present a method for quantizing convolutional neural networks for efficient training. Quantizing gradients is challenging because it requires higher granularity and their values span a wider range than the weight and feature maps. We propose an extension of the Channel-wise Block Floating Point format that allows for quick gradient computation, using a minimal amount of quantization time. This is achieved through sharing an exponent across both depth and batch dimensions in order to quantize tensors once and reuse them during backpropagation. We test our method using standard models such as AlexNet, VGG, and ResNet, on the CIFAR10, SVHN and ImageNet datasets. We show no loss of accuracy when quantizing AlexNet weights, activations and gradients to only 4 bits training ImageNet.
spellingShingle Gennari do Nascimento, M
Adrian Prisacariu, V
Fawcett, R
Langhammer, M
HyperBlock floating point: generalised quantization scheme for gradient and inference computation
title HyperBlock floating point: generalised quantization scheme for gradient and inference computation
title_full HyperBlock floating point: generalised quantization scheme for gradient and inference computation
title_fullStr HyperBlock floating point: generalised quantization scheme for gradient and inference computation
title_full_unstemmed HyperBlock floating point: generalised quantization scheme for gradient and inference computation
title_short HyperBlock floating point: generalised quantization scheme for gradient and inference computation
title_sort hyperblock floating point generalised quantization scheme for gradient and inference computation
work_keys_str_mv AT gennaridonascimentom hyperblockfloatingpointgeneralisedquantizationschemeforgradientandinferencecomputation
AT adrianprisacariuv hyperblockfloatingpointgeneralisedquantizationschemeforgradientandinferencecomputation
AT fawcettr hyperblockfloatingpointgeneralisedquantizationschemeforgradientandinferencecomputation
AT langhammerm hyperblockfloatingpointgeneralisedquantizationschemeforgradientandinferencecomputation