A Novel Throughput Enhancement Method for Deep Learning Applications on Mobile Devices With Heterogeneous Processors

Contemporary smartphones integrate dedicated AI accelerators alongside CPUs and GPUs, in response to the growing demand for deep learning applications. While existing software development kits (SDKs) for these devices provide neural network optimization techniques, they often lack system-level optim...

Full description

Bibliographic Details
Main Authors: Choonghoon Park, Soonhoi Ha
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10464274/
Description
Summary:Contemporary smartphones integrate dedicated AI accelerators alongside CPUs and GPUs, in response to the growing demand for deep learning applications. While existing software development kits (SDKs) for these devices provide neural network optimization techniques, they often lack system-level optimizations, specifically in distributing layers across heterogeneous processors. This paper introduces a novel approach to enhance the throughput of deep learning applications through the utilization of quantization and pipelining techniques. The proposed technique employs different quantization schemes for activation data and filter weights to minimize accuracy drop. A genetic algorithm is employed to explore the extensive design space of layer-wise mapping and pipelining, aiming to find the best pipelining solution. To estimate performance of each solution candidate, the actual execution time of the application on the device is measured, accounting for unique smartphone characteristics, such as dynamic voltage and frequency scaling (DVFS) and OS scheduling. The impact of thermal throttling on throughput is also investigated by running benchmark applications continuously for 10 minutes. Our technique is validated through experiments conducted on Google Pixel 6 and Samsung Galaxy S22. Throughput enhancements, ranging from <inline-formula> <tex-math notation="LaTeX">$\times 5.4$ </tex-math></inline-formula> to <inline-formula> <tex-math notation="LaTeX">$\times 7.6$ </tex-math></inline-formula> on Google Pixel 6 and <inline-formula> <tex-math notation="LaTeX">$\times 35.5$ </tex-math></inline-formula> to <inline-formula> <tex-math notation="LaTeX">$\times 44.2$ </tex-math></inline-formula> on Samsung Galaxy S22, are achieved, compared to single-processor mappings for networks with floating-point parameters. It confirms that significant performance improvements can be achieved through the proposed software optimization methodology on contemporary smartphones with diverse constraints at the user level.
ISSN:2169-3536