Energy-Efficient Acceleration of Deep Neural Networks on Realtime-Constrained Embedded Edge Devices

This paper presents a hardware management technique that enables energy-efficient acceleration of deep neural networks (DNNs) on realtime-constrained embedded edge devices. It becomes increasingly common for edge devices to incorporate dedicated hardware accelerators for neural processing. The execu...

Full description

Bibliographic Details
Main Authors: Bogil Kim, Sungjae Lee, Amit Ranjan Trivedi, William J. Song
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9262933/
_version_ 1819296247079501824
author Bogil Kim
Sungjae Lee
Amit Ranjan Trivedi
William J. Song
author_facet Bogil Kim
Sungjae Lee
Amit Ranjan Trivedi
William J. Song
author_sort Bogil Kim
collection DOAJ
description This paper presents a hardware management technique that enables energy-efficient acceleration of deep neural networks (DNNs) on realtime-constrained embedded edge devices. It becomes increasingly common for edge devices to incorporate dedicated hardware accelerators for neural processing. The execution of neural accelerators in general follows a host-device model, where CPUs offload neural computations (e.g., matrix and vector calculations) to the accelerators for datapath-optimized executions. Such a serialized execution is simple to implement and manage, but it is wasteful for the resource-limited edge devices to exercise only a single type of processing unit in a discrete execution phase. This paper presents a hardware management technique named NeuroPipe that utilizes heterogeneous processing units in an embedded edge device to accelerate DNNs in energy-efficient manner. In particular, NeuroPipe splits a neural network into groups of consecutive layers and pipelines their executions using different types of processing units. The proposed technique offers several advantages to accelerate DNN inference in the embedded edge device. It enables the embedded processor to operate at lower voltage and frequency to enhance energy efficiency while delivering the same performance as uncontrolled baseline executions, or inversely it can dispatch faster inferences at the same energy consumption. Our measurement-driven experiments based on NVIDIA Jetson AGX Xavier with 64 tensor cores and eight-core ARM CPU demonstrate that NeuroPipe reduces energy consumption by 11.4% on average without performance degradation, or it can achieve 30.5% greater performance for the same energy consumption.
first_indexed 2024-12-24T04:55:04Z
format Article
id doaj.art-90ea5134965d486f9679abdf6026d4b2
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-24T04:55:04Z
publishDate 2020-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-90ea5134965d486f9679abdf6026d4b22022-12-21T17:14:24ZengIEEEIEEE Access2169-35362020-01-01821625921627010.1109/ACCESS.2020.30389089262933Energy-Efficient Acceleration of Deep Neural Networks on Realtime-Constrained Embedded Edge DevicesBogil Kim0https://orcid.org/0000-0002-2332-7933Sungjae Lee1Amit Ranjan Trivedi2William J. Song3https://orcid.org/0000-0001-9170-5986School of Electrical and Electronic Engineering, Yonsei University, Seoul, South KoreaSchool of Electrical and Electronic Engineering, Yonsei University, Seoul, South KoreaDepartment of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, IL, USASchool of Electrical and Electronic Engineering, Yonsei University, Seoul, South KoreaThis paper presents a hardware management technique that enables energy-efficient acceleration of deep neural networks (DNNs) on realtime-constrained embedded edge devices. It becomes increasingly common for edge devices to incorporate dedicated hardware accelerators for neural processing. The execution of neural accelerators in general follows a host-device model, where CPUs offload neural computations (e.g., matrix and vector calculations) to the accelerators for datapath-optimized executions. Such a serialized execution is simple to implement and manage, but it is wasteful for the resource-limited edge devices to exercise only a single type of processing unit in a discrete execution phase. This paper presents a hardware management technique named NeuroPipe that utilizes heterogeneous processing units in an embedded edge device to accelerate DNNs in energy-efficient manner. In particular, NeuroPipe splits a neural network into groups of consecutive layers and pipelines their executions using different types of processing units. The proposed technique offers several advantages to accelerate DNN inference in the embedded edge device. It enables the embedded processor to operate at lower voltage and frequency to enhance energy efficiency while delivering the same performance as uncontrolled baseline executions, or inversely it can dispatch faster inferences at the same energy consumption. Our measurement-driven experiments based on NVIDIA Jetson AGX Xavier with 64 tensor cores and eight-core ARM CPU demonstrate that NeuroPipe reduces energy consumption by 11.4% on average without performance degradation, or it can achieve 30.5% greater performance for the same energy consumption.https://ieeexplore.ieee.org/document/9262933/Deep neural networksheterogeneous computingembedded processorshardware managementhardware measurementenergy efficiency
spellingShingle Bogil Kim
Sungjae Lee
Amit Ranjan Trivedi
William J. Song
Energy-Efficient Acceleration of Deep Neural Networks on Realtime-Constrained Embedded Edge Devices
IEEE Access
Deep neural networks
heterogeneous computing
embedded processors
hardware management
hardware measurement
energy efficiency
title Energy-Efficient Acceleration of Deep Neural Networks on Realtime-Constrained Embedded Edge Devices
title_full Energy-Efficient Acceleration of Deep Neural Networks on Realtime-Constrained Embedded Edge Devices
title_fullStr Energy-Efficient Acceleration of Deep Neural Networks on Realtime-Constrained Embedded Edge Devices
title_full_unstemmed Energy-Efficient Acceleration of Deep Neural Networks on Realtime-Constrained Embedded Edge Devices
title_short Energy-Efficient Acceleration of Deep Neural Networks on Realtime-Constrained Embedded Edge Devices
title_sort energy efficient acceleration of deep neural networks on realtime constrained embedded edge devices
topic Deep neural networks
heterogeneous computing
embedded processors
hardware management
hardware measurement
energy efficiency
url https://ieeexplore.ieee.org/document/9262933/
work_keys_str_mv AT bogilkim energyefficientaccelerationofdeepneuralnetworksonrealtimeconstrainedembeddededgedevices
AT sungjaelee energyefficientaccelerationofdeepneuralnetworksonrealtimeconstrainedembeddededgedevices
AT amitranjantrivedi energyefficientaccelerationofdeepneuralnetworksonrealtimeconstrainedembeddededgedevices
AT williamjsong energyefficientaccelerationofdeepneuralnetworksonrealtimeconstrainedembeddededgedevices