Optimising Hardware Accelerated Neural Networks with Quantisation and a Knowledge Distillation Evolutionary Algorithm

This paper compares the latency, accuracy, training time and hardware costs of neural networks compressed with our new multi-objective evolutionary algorithm called NEMOKD, and with quantisation. We evaluate NEMOKD on Intel’s Movidius Myriad X VPU processor, and quantisation on Xilinx’s programmable...

Full description

Bibliographic Details
Main Authors:	Robert Stewart, Andrew Nowlan, Pascal Bacchus, Quentin Ducasse, Ekaterina Komendantskaya
Format:	Article
Language:	English
Published:	MDPI AG 2021-02-01
Series:	Electronics
Subjects:	quantisation evolutionary algorithm neural network FPGA Movidius VPU
Online Access:	https://www.mdpi.com/2079-9292/10/4/396

Description
Summary:	This paper compares the latency, accuracy, training time and hardware costs of neural networks compressed with our new multi-objective evolutionary algorithm called NEMOKD, and with quantisation. We evaluate NEMOKD on Intel’s Movidius Myriad X VPU processor, and quantisation on Xilinx’s programmable Z7020 FPGA hardware. Evolving models with NEMOKD increases inference accuracy by up to 82% at the cost of 38% increased latency, with throughput performance of 100–590 image frames-per-second (FPS). Quantisation identifies a sweet spot of 3 bit precision in the trade-off between latency, hardware requirements, training time and accuracy. Parallelising FPGA implementations of 2 and 3 bit quantised neural networks increases throughput from 6 k FPS to 373 k FPS, a 62<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mo>×</mo><mrow></mrow></mrow></semantics></math></inline-formula> speedup.
ISSN:	2079-9292

Optimising Hardware Accelerated Neural Networks with Quantisation and a Knowledge Distillation Evolutionary Algorithm

Similar Items