Implementing a Persistent Offline Cache Improving Time to First Execution (TTFX) of GPU Code in Julia

GPU’s allow users the ability to run code with high data parallelism efficiently on specialized hardware. GPUCompiler.jl provides a GPU compilation process to Julia allowing users to write highly efficient vector operations common in scientific computing. GPUCompiler.jl does not support the same lev...

Full description

Bibliographic Details
Main Author: Warner, Collin
Other Authors: Edelman, Alan
Format: Thesis
Published: Massachusetts Institute of Technology 2023
Online Access:https://hdl.handle.net/1721.1/151406
Description
Summary:GPU’s allow users the ability to run code with high data parallelism efficiently on specialized hardware. GPUCompiler.jl provides a GPU compilation process to Julia allowing users to write highly efficient vector operations common in scientific computing. GPUCompiler.jl does not support the same level of persistent offline caching that is available in the core Julia compiler. This increases the time to first execution (TTFX) as programs need to recompile GPU code on every package reload regardless of if any code was changed. In this thesis we implement a persistent offline cache that is capable of storing both type inferred and native code drastically reducing the TTFX on precompiled GPU code. We demonstrate that by caching native code, execution can be sped up 2-3x while reducing compilation storage costs by 3-40x when compared to the current GPU compilation process.