ANNETTE: Accurate Neural Network Execution Time Estimation With Stacked Models

With new accelerator hardware for Deep Neural Networks (DNNs), the computing power for Artificial Intelligence (AI) applications has increased rapidly. However, as DNN algorithms become more complex and optimized for specific applications, latency requirements remain challenging, and it is critical...

Full description

Bibliographic Details
Main Authors:	Matthias Wess, Matvey Ivanov, Christoph Unger, Anvesh Nookala, Alexander Wendt, Axel Jantsch
Format:	Article
Language:	English
Published:	IEEE 2021-01-01
Series:	IEEE Access
Subjects:	Analytical models estimation neural network hardware
Online Access:	https://ieeexplore.ieee.org/document/9306831/

_version_	1798031050228629504
author	Matthias Wess Matvey Ivanov Christoph Unger Anvesh Nookala Alexander Wendt Axel Jantsch
author_facet	Matthias Wess Matvey Ivanov Christoph Unger Anvesh Nookala Alexander Wendt Axel Jantsch
author_sort	Matthias Wess
collection	DOAJ
description	With new accelerator hardware for Deep Neural Networks (DNNs), the computing power for Artificial Intelligence (AI) applications has increased rapidly. However, as DNN algorithms become more complex and optimized for specific applications, latency requirements remain challenging, and it is critical to find the optimal points in the design space. To decouple the architectural search from the target hardware, we propose a time estimation framework that allows for modeling the inference latency of DNNs on hardware accelerators based on mapping and layer-wise estimation models. The proposed methodology extracts a set of models from micro-kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation. We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation. We test the mixed models on the ZCU102 SoC board with Xilinx Deep Neural Network Development Kit (DNNDK) and Intel Neural Compute Stick 2 (NCS2) on a set of 12 state-of-the-art neural networks. It shows an average estimation error of 3.47% for the DNNDK and 7.44% for the NCS2, outperforming the statistical and analytical layer models for almost all selected networks. For a randomly selected subset of 34 networks of the NASBench dataset, the mixed model reaches fidelity of 0.988 in Spearman’s <inline-formula> <tex-math notation="LaTeX">$\rho $ </tex-math></inline-formula> rank correlation coefficient metric.
first_indexed	2024-04-11T19:49:56Z
format	Article
id	doaj.art-1546e71ebabf4c1381e2e6170570a848
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-04-11T19:49:56Z
publishDate	2021-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-1546e71ebabf4c1381e2e6170570a8482022-12-22T04:06:19ZengIEEEIEEE Access2169-35362021-01-0193545355610.1109/ACCESS.2020.30472599306831ANNETTE: Accurate Neural Network Execution Time Estimation With Stacked ModelsMatthias Wess0https://orcid.org/0000-0002-1877-4114Matvey Ivanov1Christoph Unger2Anvesh Nookala3Alexander Wendt4https://orcid.org/0000-0002-4909-0006Axel Jantsch5https://orcid.org/0000-0003-2251-0004Institute of Computer Technology, TU Wien, Vienna, AustriaInstitute of Computer Technology, TU Wien, Vienna, AustriaInstitute of Computer Technology, TU Wien, Vienna, AustriaInstitute of Computer Technology, TU Wien, Vienna, AustriaInstitute of Computer Technology, TU Wien, Vienna, AustriaInstitute of Computer Technology, TU Wien, Vienna, AustriaWith new accelerator hardware for Deep Neural Networks (DNNs), the computing power for Artificial Intelligence (AI) applications has increased rapidly. However, as DNN algorithms become more complex and optimized for specific applications, latency requirements remain challenging, and it is critical to find the optimal points in the design space. To decouple the architectural search from the target hardware, we propose a time estimation framework that allows for modeling the inference latency of DNNs on hardware accelerators based on mapping and layer-wise estimation models. The proposed methodology extracts a set of models from micro-kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation. We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation. We test the mixed models on the ZCU102 SoC board with Xilinx Deep Neural Network Development Kit (DNNDK) and Intel Neural Compute Stick 2 (NCS2) on a set of 12 state-of-the-art neural networks. It shows an average estimation error of 3.47% for the DNNDK and 7.44% for the NCS2, outperforming the statistical and analytical layer models for almost all selected networks. For a randomly selected subset of 34 networks of the NASBench dataset, the mixed model reaches fidelity of 0.988 in Spearman’s <inline-formula> <tex-math notation="LaTeX">$\rho $ </tex-math></inline-formula> rank correlation coefficient metric.https://ieeexplore.ieee.org/document/9306831/Analytical modelsestimationneural network hardware
spellingShingle	Matthias Wess Matvey Ivanov Christoph Unger Anvesh Nookala Alexander Wendt Axel Jantsch ANNETTE: Accurate Neural Network Execution Time Estimation With Stacked Models IEEE Access Analytical models estimation neural network hardware
title	ANNETTE: Accurate Neural Network Execution Time Estimation With Stacked Models
title_full	ANNETTE: Accurate Neural Network Execution Time Estimation With Stacked Models
title_fullStr	ANNETTE: Accurate Neural Network Execution Time Estimation With Stacked Models
title_full_unstemmed	ANNETTE: Accurate Neural Network Execution Time Estimation With Stacked Models
title_short	ANNETTE: Accurate Neural Network Execution Time Estimation With Stacked Models
title_sort	annette accurate neural network execution time estimation with stacked models
topic	Analytical models estimation neural network hardware
url	https://ieeexplore.ieee.org/document/9306831/
work_keys_str_mv	AT matthiaswess annetteaccurateneuralnetworkexecutiontimeestimationwithstackedmodels AT matveyivanov annetteaccurateneuralnetworkexecutiontimeestimationwithstackedmodels AT christophunger annetteaccurateneuralnetworkexecutiontimeestimationwithstackedmodels AT anveshnookala annetteaccurateneuralnetworkexecutiontimeestimationwithstackedmodels AT alexanderwendt annetteaccurateneuralnetworkexecutiontimeestimationwithstackedmodels AT axeljantsch annetteaccurateneuralnetworkexecutiontimeestimationwithstackedmodels

ANNETTE: Accurate Neural Network Execution Time Estimation With Stacked Models

Similar Items