Energy-Efficient Dataflow Scheduling of CNN Applications for Vector-SIMD DSP

Dataflow-scheduling techniques for convolutional neural networks (CNNs) are extensively studied to minimize the off-chip memory access. However, the efficiencies of the previously proposed techniques are limited because their optimizations only consider the general hardware such as FPGA and GPU. To...

Full description

Bibliographic Details
Main Authors:	Wontae Kim, Sangheon Lee, Ilwi Yun, Chulhee Lee, Kyujoong Lee, Hyuk-Jae Lee
Format:	Article
Language:	English
Published:	IEEE 2022-01-01
Series:	IEEE Access
Subjects:	DSP SIMD CNN memory access energy consumption reduction
Online Access:	https://ieeexplore.ieee.org/document/9858336/

_version_	1828321570020392960
author	Wontae Kim Sangheon Lee Ilwi Yun Chulhee Lee Kyujoong Lee Hyuk-Jae Lee
author_facet	Wontae Kim Sangheon Lee Ilwi Yun Chulhee Lee Kyujoong Lee Hyuk-Jae Lee
author_sort	Wontae Kim
collection	DOAJ
description	Dataflow-scheduling techniques for convolutional neural networks (CNNs) are extensively studied to minimize the off-chip memory access. However, the efficiencies of the previously proposed techniques are limited because their optimizations only consider the general hardware such as FPGA and GPU. To overcome this limitation, this paper proposes dataflow scheduling for vector-SIMD DSP to minimize the energy consumption for the off-chip memory access. First, the proposed technique attempts to group as many given layers as possible. For grouping the layers, the tiles in different layers are executed in sequence without the off-chip memory access except the first and the last layers in the group. The length of the grouped layers is determined with regard to the minimization of the energy consumption of off-chip memory by estimating the proposed energy model of the off-chip memory. However, grouping the layers results in the additional computation. To minimize this overhead, this paper solves the optimization problem for in the grouped layers. Second, for layers that cannot be grouped, the tiling along the W-axis is not considered, to maximize the size of the overlapped data in consecutive tiles. Consequently, the reuse of the overlapped data in the on-chip buffer is maximized, thereby reducing the energy consumption by the off-chip memory. For evaluation, a cycle-accurate simulation environment is established to measure the energy consumption of the off-chip memory by tracing the data between a vector-SIMD DSP and an off-chip memory. The experimental results show that compared with the baseline tiling and scheduling techniques, the proposed technique reduces the energy consumption by an average of 51% for CNN applications such as Tiny YOLOv2, MobileNetv1, VDSR.
first_indexed	2024-04-13T18:28:41Z
format	Article
id	doaj.art-6463e8b619c144e59367b2ebbcbaf2c8
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-04-13T18:28:41Z
publishDate	2022-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-6463e8b619c144e59367b2ebbcbaf2c82022-12-22T02:35:10ZengIEEEIEEE Access2169-35362022-01-0110862348624710.1109/ACCESS.2022.31972069858336Energy-Efficient Dataflow Scheduling of CNN Applications for Vector-SIMD DSPWontae Kim0Sangheon Lee1Ilwi Yun2Chulhee Lee3Kyujoong Lee4https://orcid.org/0000-0002-3080-3010Hyuk-Jae Lee5https://orcid.org/0000-0001-8895-9117Department of Electrical and Computer Engineering, Seoul National University, Seoul, South KoreaDepartment of Electrical and Computer Engineering, Seoul National University, Seoul, South KoreaDepartment of Electrical and Computer Engineering, Seoul National University, Seoul, South KoreaSystem LSI Division, Samsung Electronics Corporation, Hwaseong, South KoreaSchool of AI Convergence, Sungshin Women’s University, Seoul, South KoreaDepartment of Electrical and Computer Engineering, Seoul National University, Seoul, South KoreaDataflow-scheduling techniques for convolutional neural networks (CNNs) are extensively studied to minimize the off-chip memory access. However, the efficiencies of the previously proposed techniques are limited because their optimizations only consider the general hardware such as FPGA and GPU. To overcome this limitation, this paper proposes dataflow scheduling for vector-SIMD DSP to minimize the energy consumption for the off-chip memory access. First, the proposed technique attempts to group as many given layers as possible. For grouping the layers, the tiles in different layers are executed in sequence without the off-chip memory access except the first and the last layers in the group. The length of the grouped layers is determined with regard to the minimization of the energy consumption of off-chip memory by estimating the proposed energy model of the off-chip memory. However, grouping the layers results in the additional computation. To minimize this overhead, this paper solves the optimization problem for in the grouped layers. Second, for layers that cannot be grouped, the tiling along the W-axis is not considered, to maximize the size of the overlapped data in consecutive tiles. Consequently, the reuse of the overlapped data in the on-chip buffer is maximized, thereby reducing the energy consumption by the off-chip memory. For evaluation, a cycle-accurate simulation environment is established to measure the energy consumption of the off-chip memory by tracing the data between a vector-SIMD DSP and an off-chip memory. The experimental results show that compared with the baseline tiling and scheduling techniques, the proposed technique reduces the energy consumption by an average of 51% for CNN applications such as Tiny YOLOv2, MobileNetv1, VDSR.https://ieeexplore.ieee.org/document/9858336/DSPSIMDCNNmemory accessenergy consumption reduction
spellingShingle	Wontae Kim Sangheon Lee Ilwi Yun Chulhee Lee Kyujoong Lee Hyuk-Jae Lee Energy-Efficient Dataflow Scheduling of CNN Applications for Vector-SIMD DSP IEEE Access DSP SIMD CNN memory access energy consumption reduction
title	Energy-Efficient Dataflow Scheduling of CNN Applications for Vector-SIMD DSP
title_full	Energy-Efficient Dataflow Scheduling of CNN Applications for Vector-SIMD DSP
title_fullStr	Energy-Efficient Dataflow Scheduling of CNN Applications for Vector-SIMD DSP
title_full_unstemmed	Energy-Efficient Dataflow Scheduling of CNN Applications for Vector-SIMD DSP
title_short	Energy-Efficient Dataflow Scheduling of CNN Applications for Vector-SIMD DSP
title_sort	energy efficient dataflow scheduling of cnn applications for vector simd dsp
topic	DSP SIMD CNN memory access energy consumption reduction
url	https://ieeexplore.ieee.org/document/9858336/
work_keys_str_mv	AT wontaekim energyefficientdataflowschedulingofcnnapplicationsforvectorsimddsp AT sangheonlee energyefficientdataflowschedulingofcnnapplicationsforvectorsimddsp AT ilwiyun energyefficientdataflowschedulingofcnnapplicationsforvectorsimddsp AT chulheelee energyefficientdataflowschedulingofcnnapplicationsforvectorsimddsp AT kyujoonglee energyefficientdataflowschedulingofcnnapplicationsforvectorsimddsp AT hyukjaelee energyefficientdataflowschedulingofcnnapplicationsforvectorsimddsp

Energy-Efficient Dataflow Scheduling of CNN Applications for Vector-SIMD DSP

Similar Items