Neuron-by-Neuron Quantization for Efficient Low-Bit QNN Training

Quantized neural networks (QNNs) are widely used to achieve computationally efficient solutions to recognition problems. Overall, eight-bit QNNs have almost the same accuracy as full-precision networks, but working several times faster. However, the networks with lower quantization levels demonstrat...

Full description

Bibliographic Details
Main Authors: Artem Sher, Anton Trusov, Elena Limonova, Dmitry Nikolaev, Vladimir V. Arlazarov
Format: Article
Language:English
Published: MDPI AG 2023-04-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/11/9/2112
_version_ 1797602185183232000
author Artem Sher
Anton Trusov
Elena Limonova
Dmitry Nikolaev
Vladimir V. Arlazarov
author_facet Artem Sher
Anton Trusov
Elena Limonova
Dmitry Nikolaev
Vladimir V. Arlazarov
author_sort Artem Sher
collection DOAJ
description Quantized neural networks (QNNs) are widely used to achieve computationally efficient solutions to recognition problems. Overall, eight-bit QNNs have almost the same accuracy as full-precision networks, but working several times faster. However, the networks with lower quantization levels demonstrate inferior accuracy in comparison to their classical analogs. To solve this issue, a number of quantization-aware training (QAT) approaches were proposed. In this paper, we study QAT approaches for two- to eight-bit linear quantization schemes and propose a new combined QAT approach: neuron-by-neuron quantization with straight-through estimator (STE) gradient forwarding. It is suitable for quantizations with two- to eight-bit widths and eliminates significant accuracy drops during training, which results in better accuracy of the final QNN. We experimentally evaluate our approach on CIFAR-10 and ImageNet classification and show that it is comparable to other approaches for four to eight bits and outperforms some of them for two to three bits while being easier to implement. For example, the proposed approach to three-bit quantization of the CIFAR-10 dataset results in 73.2% accuracy, while baseline direct and layer-by-layer result in 71.4% and 67.2% accuracy, respectively. The results for two-bit quantization for ResNet18 on the ImageNet dataset are 63.69% for our approach and 61.55% for the direct baseline.
first_indexed 2024-03-11T04:13:29Z
format Article
id doaj.art-b80ce709a03c4ec79bbbb2c512eee0eb
institution Directory Open Access Journal
issn 2227-7390
language English
last_indexed 2024-03-11T04:13:29Z
publishDate 2023-04-01
publisher MDPI AG
record_format Article
series Mathematics
spelling doaj.art-b80ce709a03c4ec79bbbb2c512eee0eb2023-11-17T23:20:11ZengMDPI AGMathematics2227-73902023-04-01119211210.3390/math11092112Neuron-by-Neuron Quantization for Efficient Low-Bit QNN TrainingArtem Sher0Anton Trusov1Elena Limonova2Dmitry Nikolaev3Vladimir V. Arlazarov4Phystech School of Applied Mathematics and Informatics, Moscow Institute of Physics and Technology, 141701 Moscow, RussiaPhystech School of Applied Mathematics and Informatics, Moscow Institute of Physics and Technology, 141701 Moscow, RussiaSmart Engines Service LLC, 117312 Moscow, RussiaSmart Engines Service LLC, 117312 Moscow, RussiaSmart Engines Service LLC, 117312 Moscow, RussiaQuantized neural networks (QNNs) are widely used to achieve computationally efficient solutions to recognition problems. Overall, eight-bit QNNs have almost the same accuracy as full-precision networks, but working several times faster. However, the networks with lower quantization levels demonstrate inferior accuracy in comparison to their classical analogs. To solve this issue, a number of quantization-aware training (QAT) approaches were proposed. In this paper, we study QAT approaches for two- to eight-bit linear quantization schemes and propose a new combined QAT approach: neuron-by-neuron quantization with straight-through estimator (STE) gradient forwarding. It is suitable for quantizations with two- to eight-bit widths and eliminates significant accuracy drops during training, which results in better accuracy of the final QNN. We experimentally evaluate our approach on CIFAR-10 and ImageNet classification and show that it is comparable to other approaches for four to eight bits and outperforms some of them for two to three bits while being easier to implement. For example, the proposed approach to three-bit quantization of the CIFAR-10 dataset results in 73.2% accuracy, while baseline direct and layer-by-layer result in 71.4% and 67.2% accuracy, respectively. The results for two-bit quantization for ResNet18 on the ImageNet dataset are 63.69% for our approach and 61.55% for the direct baseline.https://www.mdpi.com/2227-7390/11/9/2112quantized neural networklow-bit quantizationlayer-by-layerneuron-by-neuron training
spellingShingle Artem Sher
Anton Trusov
Elena Limonova
Dmitry Nikolaev
Vladimir V. Arlazarov
Neuron-by-Neuron Quantization for Efficient Low-Bit QNN Training
Mathematics
quantized neural network
low-bit quantization
layer-by-layer
neuron-by-neuron training
title Neuron-by-Neuron Quantization for Efficient Low-Bit QNN Training
title_full Neuron-by-Neuron Quantization for Efficient Low-Bit QNN Training
title_fullStr Neuron-by-Neuron Quantization for Efficient Low-Bit QNN Training
title_full_unstemmed Neuron-by-Neuron Quantization for Efficient Low-Bit QNN Training
title_short Neuron-by-Neuron Quantization for Efficient Low-Bit QNN Training
title_sort neuron by neuron quantization for efficient low bit qnn training
topic quantized neural network
low-bit quantization
layer-by-layer
neuron-by-neuron training
url https://www.mdpi.com/2227-7390/11/9/2112
work_keys_str_mv AT artemsher neuronbyneuronquantizationforefficientlowbitqnntraining
AT antontrusov neuronbyneuronquantizationforefficientlowbitqnntraining
AT elenalimonova neuronbyneuronquantizationforefficientlowbitqnntraining
AT dmitrynikolaev neuronbyneuronquantizationforefficientlowbitqnntraining
AT vladimirvarlazarov neuronbyneuronquantizationforefficientlowbitqnntraining