HyperBlock floating point: generalised quantization scheme for gradient and inference computation

Prior quantization methods focus on producing networks for fast and lightweight inference. However, the cost of unquantised training is overlooked, despite requiring significantly more time and energy than inference. We present a method for quantizing convolutional neural networks for efficient trai...

ver descrição completa

Detalhes bibliográficos
Main Authors:	Gennari do Nascimento, M, Adrian Prisacariu, V, Fawcett, R, Langhammer, M
Formato:	Conference item
Idioma:	English
Publicado em:	IEEE 2023

_version_	1826309676921257984
author	Gennari do Nascimento, M Adrian Prisacariu, V Fawcett, R Langhammer, M
author_facet	Gennari do Nascimento, M Adrian Prisacariu, V Fawcett, R Langhammer, M
author_sort	Gennari do Nascimento, M
collection	OXFORD
description	Prior quantization methods focus on producing networks for fast and lightweight inference. However, the cost of unquantised training is overlooked, despite requiring significantly more time and energy than inference. We present a method for quantizing convolutional neural networks for efficient training. Quantizing gradients is challenging because it requires higher granularity and their values span a wider range than the weight and feature maps. We propose an extension of the Channel-wise Block Floating Point format that allows for quick gradient computation, using a minimal amount of quantization time. This is achieved through sharing an exponent across both depth and batch dimensions in order to quantize tensors once and reuse them during backpropagation. We test our method using standard models such as AlexNet, VGG, and ResNet, on the CIFAR10, SVHN and ImageNet datasets. We show no loss of accuracy when quantizing AlexNet weights, activations and gradients to only 4 bits training ImageNet.
first_indexed	2024-03-07T07:39:18Z
format	Conference item
id	oxford-uuid:60e5a93b-c3ab-48e1-95e8-42671e0c1698
institution	University of Oxford
language	English
last_indexed	2024-03-07T07:39:18Z
publishDate	2023
publisher	IEEE
record_format	dspace
spelling	oxford-uuid:60e5a93b-c3ab-48e1-95e8-42671e0c16982023-04-05T12:27:43ZHyperBlock floating point: generalised quantization scheme for gradient and inference computationConference itemhttp://purl.org/coar/resource_type/c_5794uuid:60e5a93b-c3ab-48e1-95e8-42671e0c1698EnglishSymplectic ElementsIEEE2023Gennari do Nascimento, MAdrian Prisacariu, VFawcett, RLanghammer, MPrior quantization methods focus on producing networks for fast and lightweight inference. However, the cost of unquantised training is overlooked, despite requiring significantly more time and energy than inference. We present a method for quantizing convolutional neural networks for efficient training. Quantizing gradients is challenging because it requires higher granularity and their values span a wider range than the weight and feature maps. We propose an extension of the Channel-wise Block Floating Point format that allows for quick gradient computation, using a minimal amount of quantization time. This is achieved through sharing an exponent across both depth and batch dimensions in order to quantize tensors once and reuse them during backpropagation. We test our method using standard models such as AlexNet, VGG, and ResNet, on the CIFAR10, SVHN and ImageNet datasets. We show no loss of accuracy when quantizing AlexNet weights, activations and gradients to only 4 bits training ImageNet.
spellingShingle	Gennari do Nascimento, M Adrian Prisacariu, V Fawcett, R Langhammer, M HyperBlock floating point: generalised quantization scheme for gradient and inference computation
title	HyperBlock floating point: generalised quantization scheme for gradient and inference computation
title_full	HyperBlock floating point: generalised quantization scheme for gradient and inference computation
title_fullStr	HyperBlock floating point: generalised quantization scheme for gradient and inference computation
title_full_unstemmed	HyperBlock floating point: generalised quantization scheme for gradient and inference computation
title_short	HyperBlock floating point: generalised quantization scheme for gradient and inference computation
title_sort	hyperblock floating point generalised quantization scheme for gradient and inference computation
work_keys_str_mv	AT gennaridonascimentom hyperblockfloatingpointgeneralisedquantizationschemeforgradientandinferencecomputation AT adrianprisacariuv hyperblockfloatingpointgeneralisedquantizationschemeforgradientandinferencecomputation AT fawcettr hyperblockfloatingpointgeneralisedquantizationschemeforgradientandinferencecomputation AT langhammerm hyperblockfloatingpointgeneralisedquantizationschemeforgradientandinferencecomputation

HyperBlock floating point: generalised quantization scheme for gradient and inference computation

Registos relacionados