Energy-Efficient Dataflow Scheduling of CNN Applications for Vector-SIMD DSP

Dataflow-scheduling techniques for convolutional neural networks (CNNs) are extensively studied to minimize the off-chip memory access. However, the efficiencies of the previously proposed techniques are limited because their optimizations only consider the general hardware such as FPGA and GPU. To...

Full description

Bibliographic Details
Main Authors: Wontae Kim, Sangheon Lee, Ilwi Yun, Chulhee Lee, Kyujoong Lee, Hyuk-Jae Lee
Format: Article
Language:English
Published: IEEE 2022-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9858336/
_version_ 1828321570020392960
author Wontae Kim
Sangheon Lee
Ilwi Yun
Chulhee Lee
Kyujoong Lee
Hyuk-Jae Lee
author_facet Wontae Kim
Sangheon Lee
Ilwi Yun
Chulhee Lee
Kyujoong Lee
Hyuk-Jae Lee
author_sort Wontae Kim
collection DOAJ
description Dataflow-scheduling techniques for convolutional neural networks (CNNs) are extensively studied to minimize the off-chip memory access. However, the efficiencies of the previously proposed techniques are limited because their optimizations only consider the general hardware such as FPGA and GPU. To overcome this limitation, this paper proposes dataflow scheduling for vector-SIMD DSP to minimize the energy consumption for the off-chip memory access. First, the proposed technique attempts to group as many given layers as possible. For grouping the layers, the tiles in different layers are executed in sequence without the off-chip memory access except the first and the last layers in the group. The length of the grouped layers is determined with regard to the minimization of the energy consumption of off-chip memory by estimating the proposed energy model of the off-chip memory. However, grouping the layers results in the additional computation. To minimize this overhead, this paper solves the optimization problem for in the grouped layers. Second, for layers that cannot be grouped, the tiling along the W-axis is not considered, to maximize the size of the overlapped data in consecutive tiles. Consequently, the reuse of the overlapped data in the on-chip buffer is maximized, thereby reducing the energy consumption by the off-chip memory. For evaluation, a cycle-accurate simulation environment is established to measure the energy consumption of the off-chip memory by tracing the data between a vector-SIMD DSP and an off-chip memory. The experimental results show that compared with the baseline tiling and scheduling techniques, the proposed technique reduces the energy consumption by an average of 51% for CNN applications such as Tiny YOLOv2, MobileNetv1, VDSR.
first_indexed 2024-04-13T18:28:41Z
format Article
id doaj.art-6463e8b619c144e59367b2ebbcbaf2c8
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-13T18:28:41Z
publishDate 2022-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-6463e8b619c144e59367b2ebbcbaf2c82022-12-22T02:35:10ZengIEEEIEEE Access2169-35362022-01-0110862348624710.1109/ACCESS.2022.31972069858336Energy-Efficient Dataflow Scheduling of CNN Applications for Vector-SIMD DSPWontae Kim0Sangheon Lee1Ilwi Yun2Chulhee Lee3Kyujoong Lee4https://orcid.org/0000-0002-3080-3010Hyuk-Jae Lee5https://orcid.org/0000-0001-8895-9117Department of Electrical and Computer Engineering, Seoul National University, Seoul, South KoreaDepartment of Electrical and Computer Engineering, Seoul National University, Seoul, South KoreaDepartment of Electrical and Computer Engineering, Seoul National University, Seoul, South KoreaSystem LSI Division, Samsung Electronics Corporation, Hwaseong, South KoreaSchool of AI Convergence, Sungshin Women’s University, Seoul, South KoreaDepartment of Electrical and Computer Engineering, Seoul National University, Seoul, South KoreaDataflow-scheduling techniques for convolutional neural networks (CNNs) are extensively studied to minimize the off-chip memory access. However, the efficiencies of the previously proposed techniques are limited because their optimizations only consider the general hardware such as FPGA and GPU. To overcome this limitation, this paper proposes dataflow scheduling for vector-SIMD DSP to minimize the energy consumption for the off-chip memory access. First, the proposed technique attempts to group as many given layers as possible. For grouping the layers, the tiles in different layers are executed in sequence without the off-chip memory access except the first and the last layers in the group. The length of the grouped layers is determined with regard to the minimization of the energy consumption of off-chip memory by estimating the proposed energy model of the off-chip memory. However, grouping the layers results in the additional computation. To minimize this overhead, this paper solves the optimization problem for in the grouped layers. Second, for layers that cannot be grouped, the tiling along the W-axis is not considered, to maximize the size of the overlapped data in consecutive tiles. Consequently, the reuse of the overlapped data in the on-chip buffer is maximized, thereby reducing the energy consumption by the off-chip memory. For evaluation, a cycle-accurate simulation environment is established to measure the energy consumption of the off-chip memory by tracing the data between a vector-SIMD DSP and an off-chip memory. The experimental results show that compared with the baseline tiling and scheduling techniques, the proposed technique reduces the energy consumption by an average of 51% for CNN applications such as Tiny YOLOv2, MobileNetv1, VDSR.https://ieeexplore.ieee.org/document/9858336/DSPSIMDCNNmemory accessenergy consumption reduction
spellingShingle Wontae Kim
Sangheon Lee
Ilwi Yun
Chulhee Lee
Kyujoong Lee
Hyuk-Jae Lee
Energy-Efficient Dataflow Scheduling of CNN Applications for Vector-SIMD DSP
IEEE Access
DSP
SIMD
CNN
memory access
energy consumption reduction
title Energy-Efficient Dataflow Scheduling of CNN Applications for Vector-SIMD DSP
title_full Energy-Efficient Dataflow Scheduling of CNN Applications for Vector-SIMD DSP
title_fullStr Energy-Efficient Dataflow Scheduling of CNN Applications for Vector-SIMD DSP
title_full_unstemmed Energy-Efficient Dataflow Scheduling of CNN Applications for Vector-SIMD DSP
title_short Energy-Efficient Dataflow Scheduling of CNN Applications for Vector-SIMD DSP
title_sort energy efficient dataflow scheduling of cnn applications for vector simd dsp
topic DSP
SIMD
CNN
memory access
energy consumption reduction
url https://ieeexplore.ieee.org/document/9858336/
work_keys_str_mv AT wontaekim energyefficientdataflowschedulingofcnnapplicationsforvectorsimddsp
AT sangheonlee energyefficientdataflowschedulingofcnnapplicationsforvectorsimddsp
AT ilwiyun energyefficientdataflowschedulingofcnnapplicationsforvectorsimddsp
AT chulheelee energyefficientdataflowschedulingofcnnapplicationsforvectorsimddsp
AT kyujoonglee energyefficientdataflowschedulingofcnnapplicationsforvectorsimddsp
AT hyukjaelee energyefficientdataflowschedulingofcnnapplicationsforvectorsimddsp