Fast convolutional neural networks on FPGAs with hls4ml

<jats:title>Abstract</jats:title> <jats:p>We introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with convolutional layers on field-programmable gate arrays (FPGAs). By extending the <jats:monospace>hls4ml</jats:monos...

Full description

Bibliographic Details
Main Authors: Aarrestad, Thea, Loncar, Vladimir, Ghielmetti, Nicolò, Pierini, Maurizio, Summers, Sioni, Ngadiuba, Jennifer, Petersson, Christoffer, Linander, Hampus, Iiyama, Yutaro, Di Guglielmo, Giuseppe, Duarte, Javier, Harris, Philip, Rankin, Dylan, Jindariani, Sergo, Pedro, Kevin, Tran, Nhan, Liu, Mia, Kreinar, Edward, Wu, Zhenbin, Hoang, Duc
Other Authors: Massachusetts Institute of Technology. Department of Physics
Format: Article
Language:English
Published: IOP Publishing 2022
Online Access:https://hdl.handle.net/1721.1/142113
_version_ 1811074561748238336
author Aarrestad, Thea
Loncar, Vladimir
Ghielmetti, Nicolò
Pierini, Maurizio
Summers, Sioni
Ngadiuba, Jennifer
Petersson, Christoffer
Linander, Hampus
Iiyama, Yutaro
Di Guglielmo, Giuseppe
Duarte, Javier
Harris, Philip
Rankin, Dylan
Jindariani, Sergo
Pedro, Kevin
Tran, Nhan
Liu, Mia
Kreinar, Edward
Wu, Zhenbin
Hoang, Duc
author2 Massachusetts Institute of Technology. Department of Physics
author_facet Massachusetts Institute of Technology. Department of Physics
Aarrestad, Thea
Loncar, Vladimir
Ghielmetti, Nicolò
Pierini, Maurizio
Summers, Sioni
Ngadiuba, Jennifer
Petersson, Christoffer
Linander, Hampus
Iiyama, Yutaro
Di Guglielmo, Giuseppe
Duarte, Javier
Harris, Philip
Rankin, Dylan
Jindariani, Sergo
Pedro, Kevin
Tran, Nhan
Liu, Mia
Kreinar, Edward
Wu, Zhenbin
Hoang, Duc
author_sort Aarrestad, Thea
collection MIT
description <jats:title>Abstract</jats:title> <jats:p>We introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with convolutional layers on field-programmable gate arrays (FPGAs). By extending the <jats:monospace>hls4ml</jats:monospace> library, we demonstrate an inference latency of 5 <jats:italic>µ</jats:italic>s using convolutional architectures, targeting microsecond latency applications like those at the CERN Large Hadron Collider. Considering benchmark models trained on the Street View House Numbers Dataset, we demonstrate various methods for model compression in order to fit the computational constraints of a typical FPGA device used in trigger and data acquisition systems of particle detectors. In particular, we discuss pruning and quantization-aware training, and demonstrate how resource utilization can be significantly reduced with little to no loss in model accuracy. We show that the FPGA critical resource consumption can be reduced by 97% with zero loss in model accuracy, and by 99% when tolerating a 6% accuracy degradation.</jats:p>
first_indexed 2024-09-23T09:51:45Z
format Article
id mit-1721.1/142113
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-23T09:51:45Z
publishDate 2022
publisher IOP Publishing
record_format dspace
spelling mit-1721.1/1421132023-04-14T18:29:13Z Fast convolutional neural networks on FPGAs with hls4ml Aarrestad, Thea Loncar, Vladimir Ghielmetti, Nicolò Pierini, Maurizio Summers, Sioni Ngadiuba, Jennifer Petersson, Christoffer Linander, Hampus Iiyama, Yutaro Di Guglielmo, Giuseppe Duarte, Javier Harris, Philip Rankin, Dylan Jindariani, Sergo Pedro, Kevin Tran, Nhan Liu, Mia Kreinar, Edward Wu, Zhenbin Hoang, Duc Massachusetts Institute of Technology. Department of Physics <jats:title>Abstract</jats:title> <jats:p>We introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with convolutional layers on field-programmable gate arrays (FPGAs). By extending the <jats:monospace>hls4ml</jats:monospace> library, we demonstrate an inference latency of 5 <jats:italic>µ</jats:italic>s using convolutional architectures, targeting microsecond latency applications like those at the CERN Large Hadron Collider. Considering benchmark models trained on the Street View House Numbers Dataset, we demonstrate various methods for model compression in order to fit the computational constraints of a typical FPGA device used in trigger and data acquisition systems of particle detectors. In particular, we discuss pruning and quantization-aware training, and demonstrate how resource utilization can be significantly reduced with little to no loss in model accuracy. We show that the FPGA critical resource consumption can be reduced by 97% with zero loss in model accuracy, and by 99% when tolerating a 6% accuracy degradation.</jats:p> 2022-04-26T18:31:03Z 2022-04-26T18:31:03Z 2021 2022-04-26T18:26:14Z Article http://purl.org/eprint/type/JournalArticle https://hdl.handle.net/1721.1/142113 Aarrestad, Thea, Loncar, Vladimir, Ghielmetti, Nicolò, Pierini, Maurizio, Summers, Sioni et al. 2021. "Fast convolutional neural networks on FPGAs with hls4ml." Machine Learning: Science and Technology, 2 (4). en 10.1088/2632-2153/AC0EA1 Machine Learning: Science and Technology Creative Commons Attribution 4.0 International license https://creativecommons.org/licenses/by/4.0/ application/pdf IOP Publishing IOP Publishing
spellingShingle Aarrestad, Thea
Loncar, Vladimir
Ghielmetti, Nicolò
Pierini, Maurizio
Summers, Sioni
Ngadiuba, Jennifer
Petersson, Christoffer
Linander, Hampus
Iiyama, Yutaro
Di Guglielmo, Giuseppe
Duarte, Javier
Harris, Philip
Rankin, Dylan
Jindariani, Sergo
Pedro, Kevin
Tran, Nhan
Liu, Mia
Kreinar, Edward
Wu, Zhenbin
Hoang, Duc
Fast convolutional neural networks on FPGAs with hls4ml
title Fast convolutional neural networks on FPGAs with hls4ml
title_full Fast convolutional neural networks on FPGAs with hls4ml
title_fullStr Fast convolutional neural networks on FPGAs with hls4ml
title_full_unstemmed Fast convolutional neural networks on FPGAs with hls4ml
title_short Fast convolutional neural networks on FPGAs with hls4ml
title_sort fast convolutional neural networks on fpgas with hls4ml
url https://hdl.handle.net/1721.1/142113
work_keys_str_mv AT aarrestadthea fastconvolutionalneuralnetworksonfpgaswithhls4ml
AT loncarvladimir fastconvolutionalneuralnetworksonfpgaswithhls4ml
AT ghielmettinicolo fastconvolutionalneuralnetworksonfpgaswithhls4ml
AT pierinimaurizio fastconvolutionalneuralnetworksonfpgaswithhls4ml
AT summerssioni fastconvolutionalneuralnetworksonfpgaswithhls4ml
AT ngadiubajennifer fastconvolutionalneuralnetworksonfpgaswithhls4ml
AT peterssonchristoffer fastconvolutionalneuralnetworksonfpgaswithhls4ml
AT linanderhampus fastconvolutionalneuralnetworksonfpgaswithhls4ml
AT iiyamayutaro fastconvolutionalneuralnetworksonfpgaswithhls4ml
AT diguglielmogiuseppe fastconvolutionalneuralnetworksonfpgaswithhls4ml
AT duartejavier fastconvolutionalneuralnetworksonfpgaswithhls4ml
AT harrisphilip fastconvolutionalneuralnetworksonfpgaswithhls4ml
AT rankindylan fastconvolutionalneuralnetworksonfpgaswithhls4ml
AT jindarianisergo fastconvolutionalneuralnetworksonfpgaswithhls4ml
AT pedrokevin fastconvolutionalneuralnetworksonfpgaswithhls4ml
AT trannhan fastconvolutionalneuralnetworksonfpgaswithhls4ml
AT liumia fastconvolutionalneuralnetworksonfpgaswithhls4ml
AT kreinaredward fastconvolutionalneuralnetworksonfpgaswithhls4ml
AT wuzhenbin fastconvolutionalneuralnetworksonfpgaswithhls4ml
AT hoangduc fastconvolutionalneuralnetworksonfpgaswithhls4ml