ConvKyber: Unleashing the Power of AI Accelerators for Faster Kyber with Novel Iteration-based Approaches
The remarkable performance capabilities of AI accelerators offer promising opportunities for accelerating cryptographic algorithms, particularly in the context of lattice-based cryptography. However, current approaches to leveraging AI accelerators often remain at a rudimentary level of implementat...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Ruhr-Universität Bochum
2024-03-01
|
Series: | Transactions on Cryptographic Hardware and Embedded Systems |
Subjects: | |
Online Access: | https://tches.iacr.org/index.php/TCHES/article/view/11420 |
_version_ | 1797262226701156352 |
---|---|
author | Tian Zhou Fangyu Zheng Guang Fan Lipeng Wan Wenxu Tang Yixuan Song Yi Bian Jingqiang Lin |
author_facet | Tian Zhou Fangyu Zheng Guang Fan Lipeng Wan Wenxu Tang Yixuan Song Yi Bian Jingqiang Lin |
author_sort | Tian Zhou |
collection | DOAJ |
description |
The remarkable performance capabilities of AI accelerators offer promising opportunities for accelerating cryptographic algorithms, particularly in the context of lattice-based cryptography. However, current approaches to leveraging AI accelerators often remain at a rudimentary level of implementation, overlooking the intricate internal mechanisms of these devices. Consequently, a significant number of computational resources is underutilized.
In this paper, we present a comprehensive exploration of NVIDIA Tensor Cores and introduce a novel framework tailored specifically for Kyber. Firstly, we propose two innovative approaches that efficiently break down Kyber’s NTT into iterative matrix multiplications, resulting in approximately a 75% reduction in costs compared to the state-of-the-art scanning-based methods. Secondly, by reversing the internal mechanisms, we precisely manipulate the internal resources of Tensor Cores using assembly-level code instead of inefficient standard interfaces, eliminating memory accesses and redundant function calls. Finally, building upon our highly optimized NTT, we provide a complete implementation for all parameter sets of Kyber. Our implementation surpasses the state-of-the-art Tensor Core based work, achieving remarkable speed-ups of 1.93x, 1.65x, 1.22x and 3.55x for polyvec_ntt, KeyGen, Enc and Dec in Kyber-1024, respectively. Even when considering execution latency, our throughput-oriented full Kyber implementation maintains an acceptable execution latency. For instance, the execution latency ranges from 1.02 to 5.68 milliseconds for Kyber-1024 on R3080 when achieving the peak throughput.
|
first_indexed | 2024-04-24T23:53:45Z |
format | Article |
id | doaj.art-22734d8eaee6410c8372164f3255bbf5 |
institution | Directory Open Access Journal |
issn | 2569-2925 |
language | English |
last_indexed | 2024-04-24T23:53:45Z |
publishDate | 2024-03-01 |
publisher | Ruhr-Universität Bochum |
record_format | Article |
series | Transactions on Cryptographic Hardware and Embedded Systems |
spelling | doaj.art-22734d8eaee6410c8372164f3255bbf52024-03-14T16:24:49ZengRuhr-Universität BochumTransactions on Cryptographic Hardware and Embedded Systems2569-29252024-03-012024210.46586/tches.v2024.i2.25-63ConvKyber: Unleashing the Power of AI Accelerators for Faster Kyber with Novel Iteration-based ApproachesTian Zhou0Fangyu Zheng1Guang Fan2Lipeng Wan3Wenxu Tang4Yixuan Song5Yi Bian6Jingqiang Lin7School of Cyber Security, University of Science and Technology of China, Heifei, ChinaSchool of Cryptology, University of Chinese Academy of Sciences, Beijing, ChinaAnt Group, Hangzhou, ChinaSchool of Cryptology, University of Chinese Academy of Sciences, Beijing, ChinaSchool of Cyber Security, University of Science and Technology of China, Heifei, ChinaAnt Group, Hangzhou, ChinaSchool of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, ChinaSchool of Cyber Security, University of Science and Technology of China, Heifei, China; Beijing Research Institute, University of Science and Technology of China, Beijing, China The remarkable performance capabilities of AI accelerators offer promising opportunities for accelerating cryptographic algorithms, particularly in the context of lattice-based cryptography. However, current approaches to leveraging AI accelerators often remain at a rudimentary level of implementation, overlooking the intricate internal mechanisms of these devices. Consequently, a significant number of computational resources is underutilized. In this paper, we present a comprehensive exploration of NVIDIA Tensor Cores and introduce a novel framework tailored specifically for Kyber. Firstly, we propose two innovative approaches that efficiently break down Kyber’s NTT into iterative matrix multiplications, resulting in approximately a 75% reduction in costs compared to the state-of-the-art scanning-based methods. Secondly, by reversing the internal mechanisms, we precisely manipulate the internal resources of Tensor Cores using assembly-level code instead of inefficient standard interfaces, eliminating memory accesses and redundant function calls. Finally, building upon our highly optimized NTT, we provide a complete implementation for all parameter sets of Kyber. Our implementation surpasses the state-of-the-art Tensor Core based work, achieving remarkable speed-ups of 1.93x, 1.65x, 1.22x and 3.55x for polyvec_ntt, KeyGen, Enc and Dec in Kyber-1024, respectively. Even when considering execution latency, our throughput-oriented full Kyber implementation maintains an acceptable execution latency. For instance, the execution latency ranges from 1.02 to 5.68 milliseconds for Kyber-1024 on R3080 when achieving the peak throughput. https://tches.iacr.org/index.php/TCHES/article/view/11420Lattice-based CryptographyGPUsTensor CoreKyber |
spellingShingle | Tian Zhou Fangyu Zheng Guang Fan Lipeng Wan Wenxu Tang Yixuan Song Yi Bian Jingqiang Lin ConvKyber: Unleashing the Power of AI Accelerators for Faster Kyber with Novel Iteration-based Approaches Transactions on Cryptographic Hardware and Embedded Systems Lattice-based Cryptography GPUs Tensor Core Kyber |
title | ConvKyber: Unleashing the Power of AI Accelerators for Faster Kyber with Novel Iteration-based Approaches |
title_full | ConvKyber: Unleashing the Power of AI Accelerators for Faster Kyber with Novel Iteration-based Approaches |
title_fullStr | ConvKyber: Unleashing the Power of AI Accelerators for Faster Kyber with Novel Iteration-based Approaches |
title_full_unstemmed | ConvKyber: Unleashing the Power of AI Accelerators for Faster Kyber with Novel Iteration-based Approaches |
title_short | ConvKyber: Unleashing the Power of AI Accelerators for Faster Kyber with Novel Iteration-based Approaches |
title_sort | convkyber unleashing the power of ai accelerators for faster kyber with novel iteration based approaches |
topic | Lattice-based Cryptography GPUs Tensor Core Kyber |
url | https://tches.iacr.org/index.php/TCHES/article/view/11420 |
work_keys_str_mv | AT tianzhou convkyberunleashingthepowerofaiacceleratorsforfasterkyberwithnoveliterationbasedapproaches AT fangyuzheng convkyberunleashingthepowerofaiacceleratorsforfasterkyberwithnoveliterationbasedapproaches AT guangfan convkyberunleashingthepowerofaiacceleratorsforfasterkyberwithnoveliterationbasedapproaches AT lipengwan convkyberunleashingthepowerofaiacceleratorsforfasterkyberwithnoveliterationbasedapproaches AT wenxutang convkyberunleashingthepowerofaiacceleratorsforfasterkyberwithnoveliterationbasedapproaches AT yixuansong convkyberunleashingthepowerofaiacceleratorsforfasterkyberwithnoveliterationbasedapproaches AT yibian convkyberunleashingthepowerofaiacceleratorsforfasterkyberwithnoveliterationbasedapproaches AT jingqianglin convkyberunleashingthepowerofaiacceleratorsforfasterkyberwithnoveliterationbasedapproaches |