Implementing a Persistent Offline Cache Improving Time to First Execution (TTFX) of GPU Code in Julia
GPU’s allow users the ability to run code with high data parallelism efficiently on specialized hardware. GPUCompiler.jl provides a GPU compilation process to Julia allowing users to write highly efficient vector operations common in scientific computing. GPUCompiler.jl does not support the same lev...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2023
|
Online Access: | https://hdl.handle.net/1721.1/151406 |
Summary: | GPU’s allow users the ability to run code with high data parallelism efficiently on specialized hardware. GPUCompiler.jl provides a GPU compilation process to Julia allowing users to write highly efficient vector operations common in scientific computing. GPUCompiler.jl does not support the same level of persistent offline caching that is available in the core Julia compiler. This increases the time to first execution (TTFX) as programs need to recompile GPU code on every package reload regardless of if any code was changed. In this thesis we implement a persistent offline cache that is capable of storing both type inferred and native code drastically reducing the TTFX on precompiled GPU code. We demonstrate that by caching native code, execution can be sped up 2-3x while reducing compilation storage costs by 3-40x when compared to the current GPU compilation process. |
---|