AI-Driven Performance Modeling for AI Inference Workloads

Deep Learning (DL) is moving towards deploying workloads not only in cloud datacenters, but also to the local devices. Although these are mostly limited to inference tasks, it still widens the range of possible target architectures significantly. Additionally, these new targets usually come with dra...

Full description

Bibliographic Details
Main Authors:	Max Sponner, Bernd Waschneck, Akash Kumar
Format:	Article
Language:	English
Published:	MDPI AG 2022-07-01
Series:	Electronics
Subjects:	performance modeling machine learning regression models
Online Access:	https://www.mdpi.com/2079-9292/11/15/2316

_version_	1797414232234393600
author	Max Sponner Bernd Waschneck Akash Kumar
author_facet	Max Sponner Bernd Waschneck Akash Kumar
author_sort	Max Sponner
collection	DOAJ
description	Deep Learning (DL) is moving towards deploying workloads not only in cloud datacenters, but also to the local devices. Although these are mostly limited to inference tasks, it still widens the range of possible target architectures significantly. Additionally, these new targets usually come with drastically reduced computation performance and memory sizes compared to the traditionally used architectures—and put the key optimization focus on the efficiency as they often depend on batteries. To help developers quickly estimate the performance of a neural network during its design phase, performance models could be used. However, these models are expensive to implement as they require in-depth knowledge about the hardware architecture and the used algorithms. Although AI-based solutions exist, these either require large datasets that are difficult to collect on the low-performance targets and/or limited to a small number of target platforms and metrics. Our solution exploits the block-based structure of neural networks, as well as the high similarity in the typically used layer configurations across neural networks, enabling the training of accurate models on significantly smaller datasets. In addition, our solution is not limited to a specific architecture or metric. We showcase the feasibility of the solution on a set of seven devices from four different hardware architectures, and with up to three performance metrics per target—including the power consumption and memory footprint. Our tests have shown that the solution achieved an error of less than 1 ms (2.6%) in latency, 0.12 J (4%) in energy consumption and 11 MiB (1.5%) in memory allocation for the whole network inference prediction, while being up to five orders of magnitude faster than a benchmark.
first_indexed	2024-03-09T05:30:06Z
format	Article
id	doaj.art-402f8a9fe45f40f99832e9adfb9a8db5
institution	Directory Open Access Journal
issn	2079-9292
language	English
last_indexed	2024-03-09T05:30:06Z
publishDate	2022-07-01
publisher	MDPI AG
record_format	Article
series	Electronics
spelling	doaj.art-402f8a9fe45f40f99832e9adfb9a8db52023-12-03T12:33:11ZengMDPI AGElectronics2079-92922022-07-011115231610.3390/electronics11152316AI-Driven Performance Modeling for AI Inference WorkloadsMax Sponner0Bernd Waschneck1Akash Kumar2Infineon Technologies Dresden GmbH & Co. KG, 01099 Dresden, GermanyInfineon Technologies Dresden GmbH & Co. KG, 01099 Dresden, GermanyCenter for Advancing Electronics Dresden (CFAED), Technical University (TU) Dresden, 01062 Dresden, GermanyDeep Learning (DL) is moving towards deploying workloads not only in cloud datacenters, but also to the local devices. Although these are mostly limited to inference tasks, it still widens the range of possible target architectures significantly. Additionally, these new targets usually come with drastically reduced computation performance and memory sizes compared to the traditionally used architectures—and put the key optimization focus on the efficiency as they often depend on batteries. To help developers quickly estimate the performance of a neural network during its design phase, performance models could be used. However, these models are expensive to implement as they require in-depth knowledge about the hardware architecture and the used algorithms. Although AI-based solutions exist, these either require large datasets that are difficult to collect on the low-performance targets and/or limited to a small number of target platforms and metrics. Our solution exploits the block-based structure of neural networks, as well as the high similarity in the typically used layer configurations across neural networks, enabling the training of accurate models on significantly smaller datasets. In addition, our solution is not limited to a specific architecture or metric. We showcase the feasibility of the solution on a set of seven devices from four different hardware architectures, and with up to three performance metrics per target—including the power consumption and memory footprint. Our tests have shown that the solution achieved an error of less than 1 ms (2.6%) in latency, 0.12 J (4%) in energy consumption and 11 MiB (1.5%) in memory allocation for the whole network inference prediction, while being up to five orders of magnitude faster than a benchmark.https://www.mdpi.com/2079-9292/11/15/2316performance modelingmachine learningregression models
spellingShingle	Max Sponner Bernd Waschneck Akash Kumar AI-Driven Performance Modeling for AI Inference Workloads Electronics performance modeling machine learning regression models
title	AI-Driven Performance Modeling for AI Inference Workloads
title_full	AI-Driven Performance Modeling for AI Inference Workloads
title_fullStr	AI-Driven Performance Modeling for AI Inference Workloads
title_full_unstemmed	AI-Driven Performance Modeling for AI Inference Workloads
title_short	AI-Driven Performance Modeling for AI Inference Workloads
title_sort	ai driven performance modeling for ai inference workloads
topic	performance modeling machine learning regression models
url	https://www.mdpi.com/2079-9292/11/15/2316
work_keys_str_mv	AT maxsponner aidrivenperformancemodelingforaiinferenceworkloads AT berndwaschneck aidrivenperformancemodelingforaiinferenceworkloads AT akashkumar aidrivenperformancemodelingforaiinferenceworkloads

AI-Driven Performance Modeling for AI Inference Workloads

Similar Items