Energy-Efficient Acceleration of Deep Neural Networks on Realtime-Constrained Embedded Edge Devices
This paper presents a hardware management technique that enables energy-efficient acceleration of deep neural networks (DNNs) on realtime-constrained embedded edge devices. It becomes increasingly common for edge devices to incorporate dedicated hardware accelerators for neural processing. The execu...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9262933/ |
_version_ | 1819296247079501824 |
---|---|
author | Bogil Kim Sungjae Lee Amit Ranjan Trivedi William J. Song |
author_facet | Bogil Kim Sungjae Lee Amit Ranjan Trivedi William J. Song |
author_sort | Bogil Kim |
collection | DOAJ |
description | This paper presents a hardware management technique that enables energy-efficient acceleration of deep neural networks (DNNs) on realtime-constrained embedded edge devices. It becomes increasingly common for edge devices to incorporate dedicated hardware accelerators for neural processing. The execution of neural accelerators in general follows a host-device model, where CPUs offload neural computations (e.g., matrix and vector calculations) to the accelerators for datapath-optimized executions. Such a serialized execution is simple to implement and manage, but it is wasteful for the resource-limited edge devices to exercise only a single type of processing unit in a discrete execution phase. This paper presents a hardware management technique named NeuroPipe that utilizes heterogeneous processing units in an embedded edge device to accelerate DNNs in energy-efficient manner. In particular, NeuroPipe splits a neural network into groups of consecutive layers and pipelines their executions using different types of processing units. The proposed technique offers several advantages to accelerate DNN inference in the embedded edge device. It enables the embedded processor to operate at lower voltage and frequency to enhance energy efficiency while delivering the same performance as uncontrolled baseline executions, or inversely it can dispatch faster inferences at the same energy consumption. Our measurement-driven experiments based on NVIDIA Jetson AGX Xavier with 64 tensor cores and eight-core ARM CPU demonstrate that NeuroPipe reduces energy consumption by 11.4% on average without performance degradation, or it can achieve 30.5% greater performance for the same energy consumption. |
first_indexed | 2024-12-24T04:55:04Z |
format | Article |
id | doaj.art-90ea5134965d486f9679abdf6026d4b2 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-24T04:55:04Z |
publishDate | 2020-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-90ea5134965d486f9679abdf6026d4b22022-12-21T17:14:24ZengIEEEIEEE Access2169-35362020-01-01821625921627010.1109/ACCESS.2020.30389089262933Energy-Efficient Acceleration of Deep Neural Networks on Realtime-Constrained Embedded Edge DevicesBogil Kim0https://orcid.org/0000-0002-2332-7933Sungjae Lee1Amit Ranjan Trivedi2William J. Song3https://orcid.org/0000-0001-9170-5986School of Electrical and Electronic Engineering, Yonsei University, Seoul, South KoreaSchool of Electrical and Electronic Engineering, Yonsei University, Seoul, South KoreaDepartment of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, IL, USASchool of Electrical and Electronic Engineering, Yonsei University, Seoul, South KoreaThis paper presents a hardware management technique that enables energy-efficient acceleration of deep neural networks (DNNs) on realtime-constrained embedded edge devices. It becomes increasingly common for edge devices to incorporate dedicated hardware accelerators for neural processing. The execution of neural accelerators in general follows a host-device model, where CPUs offload neural computations (e.g., matrix and vector calculations) to the accelerators for datapath-optimized executions. Such a serialized execution is simple to implement and manage, but it is wasteful for the resource-limited edge devices to exercise only a single type of processing unit in a discrete execution phase. This paper presents a hardware management technique named NeuroPipe that utilizes heterogeneous processing units in an embedded edge device to accelerate DNNs in energy-efficient manner. In particular, NeuroPipe splits a neural network into groups of consecutive layers and pipelines their executions using different types of processing units. The proposed technique offers several advantages to accelerate DNN inference in the embedded edge device. It enables the embedded processor to operate at lower voltage and frequency to enhance energy efficiency while delivering the same performance as uncontrolled baseline executions, or inversely it can dispatch faster inferences at the same energy consumption. Our measurement-driven experiments based on NVIDIA Jetson AGX Xavier with 64 tensor cores and eight-core ARM CPU demonstrate that NeuroPipe reduces energy consumption by 11.4% on average without performance degradation, or it can achieve 30.5% greater performance for the same energy consumption.https://ieeexplore.ieee.org/document/9262933/Deep neural networksheterogeneous computingembedded processorshardware managementhardware measurementenergy efficiency |
spellingShingle | Bogil Kim Sungjae Lee Amit Ranjan Trivedi William J. Song Energy-Efficient Acceleration of Deep Neural Networks on Realtime-Constrained Embedded Edge Devices IEEE Access Deep neural networks heterogeneous computing embedded processors hardware management hardware measurement energy efficiency |
title | Energy-Efficient Acceleration of Deep Neural Networks on Realtime-Constrained Embedded Edge Devices |
title_full | Energy-Efficient Acceleration of Deep Neural Networks on Realtime-Constrained Embedded Edge Devices |
title_fullStr | Energy-Efficient Acceleration of Deep Neural Networks on Realtime-Constrained Embedded Edge Devices |
title_full_unstemmed | Energy-Efficient Acceleration of Deep Neural Networks on Realtime-Constrained Embedded Edge Devices |
title_short | Energy-Efficient Acceleration of Deep Neural Networks on Realtime-Constrained Embedded Edge Devices |
title_sort | energy efficient acceleration of deep neural networks on realtime constrained embedded edge devices |
topic | Deep neural networks heterogeneous computing embedded processors hardware management hardware measurement energy efficiency |
url | https://ieeexplore.ieee.org/document/9262933/ |
work_keys_str_mv | AT bogilkim energyefficientaccelerationofdeepneuralnetworksonrealtimeconstrainedembeddededgedevices AT sungjaelee energyefficientaccelerationofdeepneuralnetworksonrealtimeconstrainedembeddededgedevices AT amitranjantrivedi energyefficientaccelerationofdeepneuralnetworksonrealtimeconstrainedembeddededgedevices AT williamjsong energyefficientaccelerationofdeepneuralnetworksonrealtimeconstrainedembeddededgedevices |