Energy-Efficient Dataflow Scheduling of CNN Applications for Vector-SIMD DSP
Dataflow-scheduling techniques for convolutional neural networks (CNNs) are extensively studied to minimize the off-chip memory access. However, the efficiencies of the previously proposed techniques are limited because their optimizations only consider the general hardware such as FPGA and GPU. To...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2022-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9858336/ |
_version_ | 1828321570020392960 |
---|---|
author | Wontae Kim Sangheon Lee Ilwi Yun Chulhee Lee Kyujoong Lee Hyuk-Jae Lee |
author_facet | Wontae Kim Sangheon Lee Ilwi Yun Chulhee Lee Kyujoong Lee Hyuk-Jae Lee |
author_sort | Wontae Kim |
collection | DOAJ |
description | Dataflow-scheduling techniques for convolutional neural networks (CNNs) are extensively studied to minimize the off-chip memory access. However, the efficiencies of the previously proposed techniques are limited because their optimizations only consider the general hardware such as FPGA and GPU. To overcome this limitation, this paper proposes dataflow scheduling for vector-SIMD DSP to minimize the energy consumption for the off-chip memory access. First, the proposed technique attempts to group as many given layers as possible. For grouping the layers, the tiles in different layers are executed in sequence without the off-chip memory access except the first and the last layers in the group. The length of the grouped layers is determined with regard to the minimization of the energy consumption of off-chip memory by estimating the proposed energy model of the off-chip memory. However, grouping the layers results in the additional computation. To minimize this overhead, this paper solves the optimization problem for in the grouped layers. Second, for layers that cannot be grouped, the tiling along the W-axis is not considered, to maximize the size of the overlapped data in consecutive tiles. Consequently, the reuse of the overlapped data in the on-chip buffer is maximized, thereby reducing the energy consumption by the off-chip memory. For evaluation, a cycle-accurate simulation environment is established to measure the energy consumption of the off-chip memory by tracing the data between a vector-SIMD DSP and an off-chip memory. The experimental results show that compared with the baseline tiling and scheduling techniques, the proposed technique reduces the energy consumption by an average of 51% for CNN applications such as Tiny YOLOv2, MobileNetv1, VDSR. |
first_indexed | 2024-04-13T18:28:41Z |
format | Article |
id | doaj.art-6463e8b619c144e59367b2ebbcbaf2c8 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-04-13T18:28:41Z |
publishDate | 2022-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-6463e8b619c144e59367b2ebbcbaf2c82022-12-22T02:35:10ZengIEEEIEEE Access2169-35362022-01-0110862348624710.1109/ACCESS.2022.31972069858336Energy-Efficient Dataflow Scheduling of CNN Applications for Vector-SIMD DSPWontae Kim0Sangheon Lee1Ilwi Yun2Chulhee Lee3Kyujoong Lee4https://orcid.org/0000-0002-3080-3010Hyuk-Jae Lee5https://orcid.org/0000-0001-8895-9117Department of Electrical and Computer Engineering, Seoul National University, Seoul, South KoreaDepartment of Electrical and Computer Engineering, Seoul National University, Seoul, South KoreaDepartment of Electrical and Computer Engineering, Seoul National University, Seoul, South KoreaSystem LSI Division, Samsung Electronics Corporation, Hwaseong, South KoreaSchool of AI Convergence, Sungshin Women’s University, Seoul, South KoreaDepartment of Electrical and Computer Engineering, Seoul National University, Seoul, South KoreaDataflow-scheduling techniques for convolutional neural networks (CNNs) are extensively studied to minimize the off-chip memory access. However, the efficiencies of the previously proposed techniques are limited because their optimizations only consider the general hardware such as FPGA and GPU. To overcome this limitation, this paper proposes dataflow scheduling for vector-SIMD DSP to minimize the energy consumption for the off-chip memory access. First, the proposed technique attempts to group as many given layers as possible. For grouping the layers, the tiles in different layers are executed in sequence without the off-chip memory access except the first and the last layers in the group. The length of the grouped layers is determined with regard to the minimization of the energy consumption of off-chip memory by estimating the proposed energy model of the off-chip memory. However, grouping the layers results in the additional computation. To minimize this overhead, this paper solves the optimization problem for in the grouped layers. Second, for layers that cannot be grouped, the tiling along the W-axis is not considered, to maximize the size of the overlapped data in consecutive tiles. Consequently, the reuse of the overlapped data in the on-chip buffer is maximized, thereby reducing the energy consumption by the off-chip memory. For evaluation, a cycle-accurate simulation environment is established to measure the energy consumption of the off-chip memory by tracing the data between a vector-SIMD DSP and an off-chip memory. The experimental results show that compared with the baseline tiling and scheduling techniques, the proposed technique reduces the energy consumption by an average of 51% for CNN applications such as Tiny YOLOv2, MobileNetv1, VDSR.https://ieeexplore.ieee.org/document/9858336/DSPSIMDCNNmemory accessenergy consumption reduction |
spellingShingle | Wontae Kim Sangheon Lee Ilwi Yun Chulhee Lee Kyujoong Lee Hyuk-Jae Lee Energy-Efficient Dataflow Scheduling of CNN Applications for Vector-SIMD DSP IEEE Access DSP SIMD CNN memory access energy consumption reduction |
title | Energy-Efficient Dataflow Scheduling of CNN Applications for Vector-SIMD DSP |
title_full | Energy-Efficient Dataflow Scheduling of CNN Applications for Vector-SIMD DSP |
title_fullStr | Energy-Efficient Dataflow Scheduling of CNN Applications for Vector-SIMD DSP |
title_full_unstemmed | Energy-Efficient Dataflow Scheduling of CNN Applications for Vector-SIMD DSP |
title_short | Energy-Efficient Dataflow Scheduling of CNN Applications for Vector-SIMD DSP |
title_sort | energy efficient dataflow scheduling of cnn applications for vector simd dsp |
topic | DSP SIMD CNN memory access energy consumption reduction |
url | https://ieeexplore.ieee.org/document/9858336/ |
work_keys_str_mv | AT wontaekim energyefficientdataflowschedulingofcnnapplicationsforvectorsimddsp AT sangheonlee energyefficientdataflowschedulingofcnnapplicationsforvectorsimddsp AT ilwiyun energyefficientdataflowschedulingofcnnapplicationsforvectorsimddsp AT chulheelee energyefficientdataflowschedulingofcnnapplicationsforvectorsimddsp AT kyujoonglee energyefficientdataflowschedulingofcnnapplicationsforvectorsimddsp AT hyukjaelee energyefficientdataflowschedulingofcnnapplicationsforvectorsimddsp |