4.6-Bit Quantization for Fast and Accurate Neural Network Inference on CPUs

4.6-Bit Quantization for Fast and Accurate Neural Network Inference on CPUs

Quantization is a widespread method for reducing the inference time of neural networks on mobile Central Processing Units (CPUs). Eight-bit quantized networks demonstrate similarly high quality as full precision models and perfectly fit the hardware architecture with one-byte coefficients and thirty...

Full description

Bibliographic Details
Main Authors:	Anton Trusov, Elena Limonova, Dmitry Nikolaev, Vladimir V. Arlazarov
Format:	Article
Language:	English
Published:	MDPI AG 2024-02-01
Series:	Mathematics
Subjects:	neural network quantization deep learning efficient computing SIMD
Online Access:	https://www.mdpi.com/2227-7390/12/5/651

Similar Items

Neuron-by-Neuron Quantization for Efficient Low-Bit QNN Training
by: Artem Sher, et al.
Published: (2023-04-01)

CANET: Quantized Neural Network Inference With 8-bit Carry-Aware Accumulator
by: Jingxuan Yang, et al.
Published: (2024-01-01)

Training Multi-Bit Quantized and Binarized Networks with a Learnable Symmetric Quantizer
by: Phuoc Pham, et al.
Published: (2021-01-01)

Design of a 2-Bit Neural Network Quantizer for Laplacian Source
by: Zoran Perić, et al.
Published: (2021-07-01)

Clipping-Based Post Training 8-Bit Quantization of Convolution Neural Networks for Object Detection
by: Leisheng Chen, et al.
Published: (2022-12-01)

AE-Qdrop: Towards Accurate and Efficient Low-Bit Post-Training Quantization for A Convolutional Neural Network
by: Jixing Li, et al.
Published: (2024-02-01)

Non-Zero Grid for Accurate 2-Bit Additive Power-of-Two CNN Quantization
by: Young Min Kim, et al.
Published: (2023-01-01)

Entropy-Constrained Scalar Quantization with a Lossy-Compressed Bit
by: Melanie F. Pradier, et al.
Published: (2016-12-01)

PATTERN RECOGNITION THEORY AND NEUROCOMPUTERS
by: M. M. Tatur
Published: (2019-06-01)

An Approach for Matrix Multiplication of 32-Bit Fixed Point Numbers by Means of 16-Bit SIMD Instructions on DSP
by: Ilia Safonov, et al.
Published: (2022-12-01)

Selective Vectorization for Short-Vector Instructions
by: Amarasinghe, Saman, et al.
Published: (2009)

Robust 2-bit Quantization of Weights in Neural Network Modeled by Laplacian Distribution
by: PERIC, Z., et al.
Published: (2021-08-01)

Latitude-Adaptive Integer Bit Allocation for Quantization of Omnidirectional Images
by: Qian Sima, et al.
Published: (2024-02-01)

Optimization of the Sampling Periods and the Quantization Bit Lengths for Networked Estimation
by: Young Soo Suh, et al.
Published: (2010-06-01)

The Quantization of Gravity: Quantization of the Hamilton Equations
by: Claus Gerhardt
Published: (2021-04-01)

Two Novel Non-Uniform Quantizers with Application in Post-Training Quantization
by: Zoran Perić, et al.
Published: (2022-09-01)

Training and Inference of Optical Neural Networks with Noise and Low-Bits Control
by: Danni Zhang, et al.
Published: (2021-04-01)

Joint high-dimensional soft bit estimation and quantization using deep learning
by: Marius Arvinte, et al.
Published: (2022-06-01)

A Photonic Digitization Scheme With Enhanced Bit Resolution Based on Hierarchical Quantization
by: Shuna Yang, et al.
Published: (2020-01-01)

Super-Resolution Model Quantized in Multi-Precision
by: Jingyu Liu, et al.
Published: (2021-09-01)

GradFreeBits: Gradient-Free Bit Allocation for Mixed-Precision Neural Networks
by: Benjamin Jacob Bodner, et al.
Published: (2022-12-01)

Fast Computation of RFD-Like Descriptors in Four Orientations
by: Anton V. Trusov, et al.
Published: (2023-01-01)

Gaussian Multiple Access Channels with One-Bit Quantizer at the Receiver †,‡
by: Borzoo Rassouli, et al.
Published: (2018-09-01)

Comparison on Color Quantization Techniques
by: Alyaa taqi
Published: (2008-06-01)

Round-Efficient Secure Inference Based on Masked Secret Sharing for Quantized Neural Network
by: Weiming Wei, et al.
Published: (2023-02-01)

Ps and Qs: Quantization-Aware Pruning for Efficient Low Latency Neural Network Inference
by: Benjamin Hawks, et al.
Published: (2021-07-01)

Bit-Weight Adjustment for Bridging Uniform and Non-Uniform Quantization to Build Efficient Image Classifiers
by: Xichuan Zhou, et al.
Published: (2023-12-01)

Geometric formulation of Berezin deformation quantization
by: R. Roknizadeh
Published: (2002-06-01)

One-Bit Quantization and Distributed Detection with an Unknown Scale Parameter
by: Fei Gao, et al.
Published: (2015-08-01)

Whether the Support Region of Three-Bit Uniform Quantizer Has a Strong Impact on Post-Training Quantization for MNIST Dataset?
by: Jelena Nikolić, et al.
Published: (2021-12-01)

Unmanned ship heading tracking control strategy with state quantization and input quantization
by: Wei LI, et al.
Published: (2024-02-01)

Integer-Only CNNs with 4 Bit Weights and Bit-Shift Quantization Scales at Full-Precision Accuracy
by: Maarten Vandersteegen, et al.
Published: (2021-11-01)

Quantization Framework for Fast Spiking Neural Networks
by: Chen Li, et al.
Published: (2022-07-01)

Deep Neural Network Quantization Framework for Effective Defense against Membership Inference Attacks
by: Azadeh Famili, et al.
Published: (2023-09-01)

A Deep Learning Framework of Quantized Compressed Sensing for Wireless Neural Recording
by: Biao Sun, et al.
Published: (2016-01-01)

Three Natural Generalizations of Fedosov Quantization
by: Klaus Bering
Published: (2009-03-01)

Quantization for Infinite Affine Transformations
by: Doğan Çömez, et al.
Published: (2022-04-01)

How to Secure Valid Quantizations
by: John R. Klauder
Published: (2022-09-01)

Optimization of Linear Quantization for General and Effective Low Bit-Width Network Compression
by: Wenxin Yang, et al.
Published: (2023-01-01)

Mitigating Quantization Lobes in mmWave Low-Bit Reconfigurable Reflective Surfaces
by: Bharath G. Kashyap, et al.
Published: (2020-01-01)