PartitionTuner: An operator scheduler for deep-learning compilers supporting multiple heterogeneous processing units

Recently, embedded systems, such as mobile platforms, have multiple proces sing units that can operate in parallel, such as centralized processing units (CPUs) and neural processing units (NPUs). We can use deep-learning compilers to generate machine code optimized for these embedded systems from a...

Full description

Bibliographic Details
Main Authors: Misun Yu, Yongin Kwon, Jemin Lee, Jeman Park, Junmo Park, Taeho Kim
Format: Article
Language:English
Published: Electronics and Telecommunications Research Institute (ETRI) 2023-04-01
Series:ETRI Journal
Subjects:
Online Access:https://doi.org/10.4218/etrij.2021-0446
_version_ 1827959452197715968
author Misun Yu
Yongin Kwon
Jemin Lee
Jeman Park
Junmo Park
Taeho Kim
author_facet Misun Yu
Yongin Kwon
Jemin Lee
Jeman Park
Junmo Park
Taeho Kim
author_sort Misun Yu
collection DOAJ
description Recently, embedded systems, such as mobile platforms, have multiple proces sing units that can operate in parallel, such as centralized processing units (CPUs) and neural processing units (NPUs). We can use deep-learning compilers to generate machine code optimized for these embedded systems from a deep neural network (DNN). However, the deep-learning compilers proposed so far generate codes that sequentially execute DNN operators on a single processing unit or parallel codes for graphic processing units (GPUs). In this study, we propose PartitionTuner, an operator scheduler for deep-learning compilers that supports multiple heterogeneous PUs including CPUs and NPUs. PartitionTuner can generate an operator-scheduling plan that uses all available PUs simultaneously to minimize overall DNN inference time. Operator scheduling is based on the analysis of DNN architecture and the performance profiles of individual and group operators measured on heterogeneous processing units. By the experiments for seven DNNs, PartitionTuner generates scheduling plans that perform 5.03% better than a static type-based operator-scheduling technique for SqueezeNet. In addition, PartitionTuner outperforms recent profiling-based operator-scheduling techniques for ResNet50, ResNet18, and SqueezeNet by 7.18%, 5.36%, and 2.73%, respectively.
first_indexed 2024-04-09T15:52:22Z
format Article
id doaj.art-cdd2a0f42f2b41ff9c8a699b77bf4158
institution Directory Open Access Journal
issn 1225-6463
language English
last_indexed 2024-04-09T15:52:22Z
publishDate 2023-04-01
publisher Electronics and Telecommunications Research Institute (ETRI)
record_format Article
series ETRI Journal
spelling doaj.art-cdd2a0f42f2b41ff9c8a699b77bf41582023-04-26T06:13:27ZengElectronics and Telecommunications Research Institute (ETRI)ETRI Journal1225-64632023-04-0145231832810.4218/etrij.2021-044610.4218/etrij.2021-0446PartitionTuner: An operator scheduler for deep-learning compilers supporting multiple heterogeneous processing unitsMisun YuYongin KwonJemin LeeJeman ParkJunmo ParkTaeho KimRecently, embedded systems, such as mobile platforms, have multiple proces sing units that can operate in parallel, such as centralized processing units (CPUs) and neural processing units (NPUs). We can use deep-learning compilers to generate machine code optimized for these embedded systems from a deep neural network (DNN). However, the deep-learning compilers proposed so far generate codes that sequentially execute DNN operators on a single processing unit or parallel codes for graphic processing units (GPUs). In this study, we propose PartitionTuner, an operator scheduler for deep-learning compilers that supports multiple heterogeneous PUs including CPUs and NPUs. PartitionTuner can generate an operator-scheduling plan that uses all available PUs simultaneously to minimize overall DNN inference time. Operator scheduling is based on the analysis of DNN architecture and the performance profiles of individual and group operators measured on heterogeneous processing units. By the experiments for seven DNNs, PartitionTuner generates scheduling plans that perform 5.03% better than a static type-based operator-scheduling technique for SqueezeNet. In addition, PartitionTuner outperforms recent profiling-based operator-scheduling techniques for ResNet50, ResNet18, and SqueezeNet by 7.18%, 5.36%, and 2.73%, respectively.https://doi.org/10.4218/etrij.2021-0446deep neural networkdeep-learning compilerparallel processingpartitioning
spellingShingle Misun Yu
Yongin Kwon
Jemin Lee
Jeman Park
Junmo Park
Taeho Kim
PartitionTuner: An operator scheduler for deep-learning compilers supporting multiple heterogeneous processing units
ETRI Journal
deep neural network
deep-learning compiler
parallel processing
partitioning
title PartitionTuner: An operator scheduler for deep-learning compilers supporting multiple heterogeneous processing units
title_full PartitionTuner: An operator scheduler for deep-learning compilers supporting multiple heterogeneous processing units
title_fullStr PartitionTuner: An operator scheduler for deep-learning compilers supporting multiple heterogeneous processing units
title_full_unstemmed PartitionTuner: An operator scheduler for deep-learning compilers supporting multiple heterogeneous processing units
title_short PartitionTuner: An operator scheduler for deep-learning compilers supporting multiple heterogeneous processing units
title_sort partitiontuner an operator scheduler for deep learning compilers supporting multiple heterogeneous processing units
topic deep neural network
deep-learning compiler
parallel processing
partitioning
url https://doi.org/10.4218/etrij.2021-0446
work_keys_str_mv AT misunyu partitiontuneranoperatorschedulerfordeeplearningcompilerssupportingmultipleheterogeneousprocessingunits
AT yonginkwon partitiontuneranoperatorschedulerfordeeplearningcompilerssupportingmultipleheterogeneousprocessingunits
AT jeminlee partitiontuneranoperatorschedulerfordeeplearningcompilerssupportingmultipleheterogeneousprocessingunits
AT jemanpark partitiontuneranoperatorschedulerfordeeplearningcompilerssupportingmultipleheterogeneousprocessingunits
AT junmopark partitiontuneranoperatorschedulerfordeeplearningcompilerssupportingmultipleheterogeneousprocessingunits
AT taehokim partitiontuneranoperatorschedulerfordeeplearningcompilerssupportingmultipleheterogeneousprocessingunits