Summary: | Fully Homomorphic Encryption (FHE) enables offloading computation to untrusted servers with cryptographic privacy. Despite its attractive security, FHE is not yet widely adopted due to its prohibitive overheads, about 10,000× over unencrypted computation.
Hardware acceleration is an attractive approach to bridge this performance gap, but it brings new challenges. These include operations on large vectors with complex dependencies that current vector processor architectures cannot handle, as well as extreme memory bandwidth demands. This thesis presents two FHE accelerators that address these challenges: F1 and CraterLake.
F1 is the őrst programmable FHE accelerator, i.e., capable of executing full FHE programs. F1 is a wide-vector processor with novel functional units deeply specialized to FHE primitives. This organization provides so much compute throughput that data movement becomes the key bottleneck. Thus, F1 is primarily designed to minimize data movement. It speeds up shallow FHE computations (i.e., those of limited multiplicative depth) by gmean 5,400× over a 4-core CPU. Unfortunately, F1 becomes memory bandwidth bound on deeper computations (e.g., deep neural networks). This is because deep FHE programs require very large ciphertexts (tens of MBs each) and different algorithms that F1 does not support well.
CraterLake addresses these shortcomings and is the őrst accelerator to effectively speed up arbitrarily large FHE programs. CraterLake introduces a new hardware architecture that efficiently scales to very large ciphertexts, novel functional units to accelerate key kernels, and new algorithms and compiler techniques to reduce data movement. These advances help CraterLake outperform a 32-core CPU by gmean 4,600× and deliver 11.2× the performance of F1 on deep benchmarks, even when we scale F1’s architecture to the size of CraterLake. These speedups enable new applications for FHE, such as real-time inference using deep neural networks.
|