Model compression and simplification pipelines for fast deep neural network inference in FPGAs in HEP

Abstract Resource utilization plays a crucial role for successful implementation of fast real-time inference for deep neural networks (DNNs) and convolutional neural networks (CNNs) on latest generation of hardware accelerators (FPGAs, SoCs, ACAPs, GPUs). To fulfil the needs of the triggers that are...

Full description

Bibliographic Details
Main Authors:	Simone Francescato, Stefano Giagu, Federica Riti, Graziella Russo, Luigi Sabetta, Federico Tortonesi
Format:	Article
Language:	English
Published:	SpringerOpen 2021-11-01
Series:	European Physical Journal C: Particles and Fields
Online Access:	https://doi.org/10.1140/epjc/s10052-021-09770-w

_version_	1818740382471553024
author	Simone Francescato Stefano Giagu Federica Riti Graziella Russo Luigi Sabetta Federico Tortonesi
author_facet	Simone Francescato Stefano Giagu Federica Riti Graziella Russo Luigi Sabetta Federico Tortonesi
author_sort	Simone Francescato
collection	DOAJ
description	Abstract Resource utilization plays a crucial role for successful implementation of fast real-time inference for deep neural networks (DNNs) and convolutional neural networks (CNNs) on latest generation of hardware accelerators (FPGAs, SoCs, ACAPs, GPUs). To fulfil the needs of the triggers that are in development for the upgraded LHC detectors, we have developed a multi-stage compression approach based on conventional compression strategies (pruning and quantization) to reduce the memory footprint of the model and knowledge transfer techniques, crucial to streamline the DNNs simplifying the synthesis phase in the FPGA firmware and improving explainability. We present the developed methodologies and the results of the implementation in a working engineering pipeline used as pre-processing stage to high level synthesis tools (HLS4ML, Xilinx Vivado HLS, etc.). We show how it is possible to build ultra-light deep neural networks in practice, by applying the method to a realistic HEP use-case: a toy simulation of one of the triggers planned for the HL-LHC.
first_indexed	2024-12-18T01:39:50Z
format	Article
id	doaj.art-39c990620026419e9435862b94fc5b24
institution	Directory Open Access Journal
issn	1434-6044 1434-6052
language	English
last_indexed	2024-12-18T01:39:50Z
publishDate	2021-11-01
publisher	SpringerOpen
record_format	Article
series	European Physical Journal C: Particles and Fields
spelling	doaj.art-39c990620026419e9435862b94fc5b242022-12-21T21:25:22ZengSpringerOpenEuropean Physical Journal C: Particles and Fields1434-60441434-60522021-11-01811111010.1140/epjc/s10052-021-09770-wModel compression and simplification pipelines for fast deep neural network inference in FPGAs in HEPSimone Francescato0Stefano Giagu1Federica Riti2Graziella Russo3Luigi Sabetta4Federico Tortonesi5Department of Physics, Harvard UniversityDepartment of Physics, Sapienza University and INFN Sezione di RomaDepartment of Physics, ETH ZürichDepartment of Physics, Sapienza University and INFN Sezione di RomaDepartment of Physics, Sapienza University and INFN Sezione di RomaDepartment of Physics, Sapienza University and INFN Sezione di RomaAbstract Resource utilization plays a crucial role for successful implementation of fast real-time inference for deep neural networks (DNNs) and convolutional neural networks (CNNs) on latest generation of hardware accelerators (FPGAs, SoCs, ACAPs, GPUs). To fulfil the needs of the triggers that are in development for the upgraded LHC detectors, we have developed a multi-stage compression approach based on conventional compression strategies (pruning and quantization) to reduce the memory footprint of the model and knowledge transfer techniques, crucial to streamline the DNNs simplifying the synthesis phase in the FPGA firmware and improving explainability. We present the developed methodologies and the results of the implementation in a working engineering pipeline used as pre-processing stage to high level synthesis tools (HLS4ML, Xilinx Vivado HLS, etc.). We show how it is possible to build ultra-light deep neural networks in practice, by applying the method to a realistic HEP use-case: a toy simulation of one of the triggers planned for the HL-LHC.https://doi.org/10.1140/epjc/s10052-021-09770-w
spellingShingle	Simone Francescato Stefano Giagu Federica Riti Graziella Russo Luigi Sabetta Federico Tortonesi Model compression and simplification pipelines for fast deep neural network inference in FPGAs in HEP European Physical Journal C: Particles and Fields
title	Model compression and simplification pipelines for fast deep neural network inference in FPGAs in HEP
title_full	Model compression and simplification pipelines for fast deep neural network inference in FPGAs in HEP
title_fullStr	Model compression and simplification pipelines for fast deep neural network inference in FPGAs in HEP
title_full_unstemmed	Model compression and simplification pipelines for fast deep neural network inference in FPGAs in HEP
title_short	Model compression and simplification pipelines for fast deep neural network inference in FPGAs in HEP
title_sort	model compression and simplification pipelines for fast deep neural network inference in fpgas in hep
url	https://doi.org/10.1140/epjc/s10052-021-09770-w
work_keys_str_mv	AT simonefrancescato modelcompressionandsimplificationpipelinesforfastdeepneuralnetworkinferenceinfpgasinhep AT stefanogiagu modelcompressionandsimplificationpipelinesforfastdeepneuralnetworkinferenceinfpgasinhep AT federicariti modelcompressionandsimplificationpipelinesforfastdeepneuralnetworkinferenceinfpgasinhep AT graziellarusso modelcompressionandsimplificationpipelinesforfastdeepneuralnetworkinferenceinfpgasinhep AT luigisabetta modelcompressionandsimplificationpipelinesforfastdeepneuralnetworkinferenceinfpgasinhep AT federicotortonesi modelcompressionandsimplificationpipelinesforfastdeepneuralnetworkinferenceinfpgasinhep

Model compression and simplification pipelines for fast deep neural network inference in FPGAs in HEP

Similar Items