Latency Estimation Tool and Investigation of Neural Networks Inference on Mobile GPU

A lot of deep learning applications are desired to be run on mobile devices. Both accuracy and inference time are meaningful for a lot of them. While the number of FLOPs is usually used as a proxy for neural network latency, it may not be the best choice. In order to obtain a better approximation of...

Full description

Bibliographic Details
Main Authors:	Evgeny Ponomarev, Sergey Matveev, Ivan Oseledets, Valery Glukhov
Format:	Article
Language:	English
Published:	MDPI AG 2021-08-01
Series:	Computers
Subjects:	latency inference mobile GPU neural architecture search
Online Access:	https://www.mdpi.com/2073-431X/10/8/104

_version_	1797524229921439744
author	Evgeny Ponomarev Sergey Matveev Ivan Oseledets Valery Glukhov
author_facet	Evgeny Ponomarev Sergey Matveev Ivan Oseledets Valery Glukhov
author_sort	Evgeny Ponomarev
collection	DOAJ
description	A lot of deep learning applications are desired to be run on mobile devices. Both accuracy and inference time are meaningful for a lot of them. While the number of FLOPs is usually used as a proxy for neural network latency, it may not be the best choice. In order to obtain a better approximation of latency, the research community uses lookup tables of all possible layers for the calculation of the inference on a mobile CPU. It requires only a small number of experiments. Unfortunately, on a mobile GPU, this method is not applicable in a straightforward way and shows low precision. In this work, we consider latency approximation on a mobile GPU as a data- and hardware-specific problem. Our main goal is to construct a convenient Latency Estimation Tool for Investigation (LETI) of neural network inference and building robust and accurate latency prediction models for each specific task. To achieve this goal, we make tools that provide a convenient way to conduct massive experiments on different target devices focusing on a mobile GPU. After evaluation of the dataset, one can train the regression model on experimental data and use it for future latency prediction and analysis. We experimentally demonstrate the applicability of such an approach on a subset of the popular NAS-Benchmark 101 dataset for two different mobile GPU.
first_indexed	2024-03-10T08:54:26Z
format	Article
id	doaj.art-45c6074a2893469f855cf0abedf05b1a
institution	Directory Open Access Journal
issn	2073-431X
language	English
last_indexed	2024-03-10T08:54:26Z
publishDate	2021-08-01
publisher	MDPI AG
record_format	Article
series	Computers
spelling	doaj.art-45c6074a2893469f855cf0abedf05b1a2023-11-22T07:15:43ZengMDPI AGComputers2073-431X2021-08-0110810410.3390/computers10080104Latency Estimation Tool and Investigation of Neural Networks Inference on Mobile GPUEvgeny Ponomarev0Sergey Matveev1Ivan Oseledets2Valery Glukhov3Skolkovo Institute of Science and Technology, 143026 Moscow, RussiaFaculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, 119991 Moscow, RussiaSkolkovo Institute of Science and Technology, 143026 Moscow, RussiaNoah’s Ark Lab., Huawei Technologies, 121614 Moscow, RussiaA lot of deep learning applications are desired to be run on mobile devices. Both accuracy and inference time are meaningful for a lot of them. While the number of FLOPs is usually used as a proxy for neural network latency, it may not be the best choice. In order to obtain a better approximation of latency, the research community uses lookup tables of all possible layers for the calculation of the inference on a mobile CPU. It requires only a small number of experiments. Unfortunately, on a mobile GPU, this method is not applicable in a straightforward way and shows low precision. In this work, we consider latency approximation on a mobile GPU as a data- and hardware-specific problem. Our main goal is to construct a convenient Latency Estimation Tool for Investigation (LETI) of neural network inference and building robust and accurate latency prediction models for each specific task. To achieve this goal, we make tools that provide a convenient way to conduct massive experiments on different target devices focusing on a mobile GPU. After evaluation of the dataset, one can train the regression model on experimental data and use it for future latency prediction and analysis. We experimentally demonstrate the applicability of such an approach on a subset of the popular NAS-Benchmark 101 dataset for two different mobile GPU.https://www.mdpi.com/2073-431X/10/8/104latencyinferencemobile GPUneural architecture search
spellingShingle	Evgeny Ponomarev Sergey Matveev Ivan Oseledets Valery Glukhov Latency Estimation Tool and Investigation of Neural Networks Inference on Mobile GPU Computers latency inference mobile GPU neural architecture search
title	Latency Estimation Tool and Investigation of Neural Networks Inference on Mobile GPU
title_full	Latency Estimation Tool and Investigation of Neural Networks Inference on Mobile GPU
title_fullStr	Latency Estimation Tool and Investigation of Neural Networks Inference on Mobile GPU
title_full_unstemmed	Latency Estimation Tool and Investigation of Neural Networks Inference on Mobile GPU
title_short	Latency Estimation Tool and Investigation of Neural Networks Inference on Mobile GPU
title_sort	latency estimation tool and investigation of neural networks inference on mobile gpu
topic	latency inference mobile GPU neural architecture search
url	https://www.mdpi.com/2073-431X/10/8/104
work_keys_str_mv	AT evgenyponomarev latencyestimationtoolandinvestigationofneuralnetworksinferenceonmobilegpu AT sergeymatveev latencyestimationtoolandinvestigationofneuralnetworksinferenceonmobilegpu AT ivanoseledets latencyestimationtoolandinvestigationofneuralnetworksinferenceonmobilegpu AT valeryglukhov latencyestimationtoolandinvestigationofneuralnetworksinferenceonmobilegpu

Latency Estimation Tool and Investigation of Neural Networks Inference on Mobile GPU

Similar Items