ConvKyber: Unleashing the Power of AI Accelerators for Faster Kyber with Novel Iteration-based Approaches

The remarkable performance capabilities of AI accelerators offer promising opportunities for accelerating cryptographic algorithms, particularly in the context of lattice-based cryptography. However, current approaches to leveraging AI accelerators often remain at a rudimentary level of implementat...

Full description

Bibliographic Details
Main Authors: Tian Zhou, Fangyu Zheng, Guang Fan, Lipeng Wan, Wenxu Tang, Yixuan Song, Yi Bian, Jingqiang Lin
Format: Article
Language:English
Published: Ruhr-Universität Bochum 2024-03-01
Series:Transactions on Cryptographic Hardware and Embedded Systems
Subjects:
Online Access:https://tches.iacr.org/index.php/TCHES/article/view/11420
_version_ 1797262226701156352
author Tian Zhou
Fangyu Zheng
Guang Fan
Lipeng Wan
Wenxu Tang
Yixuan Song
Yi Bian
Jingqiang Lin
author_facet Tian Zhou
Fangyu Zheng
Guang Fan
Lipeng Wan
Wenxu Tang
Yixuan Song
Yi Bian
Jingqiang Lin
author_sort Tian Zhou
collection DOAJ
description The remarkable performance capabilities of AI accelerators offer promising opportunities for accelerating cryptographic algorithms, particularly in the context of lattice-based cryptography. However, current approaches to leveraging AI accelerators often remain at a rudimentary level of implementation, overlooking the intricate internal mechanisms of these devices. Consequently, a significant number of computational resources is underutilized. In this paper, we present a comprehensive exploration of NVIDIA Tensor Cores and introduce a novel framework tailored specifically for Kyber. Firstly, we propose two innovative approaches that efficiently break down Kyber’s NTT into iterative matrix multiplications, resulting in approximately a 75% reduction in costs compared to the state-of-the-art scanning-based methods. Secondly, by reversing the internal mechanisms, we precisely manipulate the internal resources of Tensor Cores using assembly-level code instead of inefficient standard interfaces, eliminating memory accesses and redundant function calls. Finally, building upon our highly optimized NTT, we provide a complete implementation for all parameter sets of Kyber. Our implementation surpasses the state-of-the-art Tensor Core based work, achieving remarkable speed-ups of 1.93x, 1.65x, 1.22x and 3.55x for polyvec_ntt, KeyGen, Enc and Dec in Kyber-1024, respectively. Even when considering execution latency, our throughput-oriented full Kyber implementation maintains an acceptable execution latency. For instance, the execution latency ranges from 1.02 to 5.68 milliseconds for Kyber-1024 on R3080 when achieving the peak throughput.
first_indexed 2024-04-24T23:53:45Z
format Article
id doaj.art-22734d8eaee6410c8372164f3255bbf5
institution Directory Open Access Journal
issn 2569-2925
language English
last_indexed 2024-04-24T23:53:45Z
publishDate 2024-03-01
publisher Ruhr-Universität Bochum
record_format Article
series Transactions on Cryptographic Hardware and Embedded Systems
spelling doaj.art-22734d8eaee6410c8372164f3255bbf52024-03-14T16:24:49ZengRuhr-Universität BochumTransactions on Cryptographic Hardware and Embedded Systems2569-29252024-03-012024210.46586/tches.v2024.i2.25-63ConvKyber: Unleashing the Power of AI Accelerators for Faster Kyber with Novel Iteration-based ApproachesTian Zhou0Fangyu Zheng1Guang Fan2Lipeng Wan3Wenxu Tang4Yixuan Song5Yi Bian6Jingqiang Lin7School of Cyber Security, University of Science and Technology of China, Heifei, ChinaSchool of Cryptology, University of Chinese Academy of Sciences, Beijing, ChinaAnt Group, Hangzhou, ChinaSchool of Cryptology, University of Chinese Academy of Sciences, Beijing, ChinaSchool of Cyber Security, University of Science and Technology of China, Heifei, ChinaAnt Group, Hangzhou, ChinaSchool of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, ChinaSchool of Cyber Security, University of Science and Technology of China, Heifei, China; Beijing Research Institute, University of Science and Technology of China, Beijing, China The remarkable performance capabilities of AI accelerators offer promising opportunities for accelerating cryptographic algorithms, particularly in the context of lattice-based cryptography. However, current approaches to leveraging AI accelerators often remain at a rudimentary level of implementation, overlooking the intricate internal mechanisms of these devices. Consequently, a significant number of computational resources is underutilized. In this paper, we present a comprehensive exploration of NVIDIA Tensor Cores and introduce a novel framework tailored specifically for Kyber. Firstly, we propose two innovative approaches that efficiently break down Kyber’s NTT into iterative matrix multiplications, resulting in approximately a 75% reduction in costs compared to the state-of-the-art scanning-based methods. Secondly, by reversing the internal mechanisms, we precisely manipulate the internal resources of Tensor Cores using assembly-level code instead of inefficient standard interfaces, eliminating memory accesses and redundant function calls. Finally, building upon our highly optimized NTT, we provide a complete implementation for all parameter sets of Kyber. Our implementation surpasses the state-of-the-art Tensor Core based work, achieving remarkable speed-ups of 1.93x, 1.65x, 1.22x and 3.55x for polyvec_ntt, KeyGen, Enc and Dec in Kyber-1024, respectively. Even when considering execution latency, our throughput-oriented full Kyber implementation maintains an acceptable execution latency. For instance, the execution latency ranges from 1.02 to 5.68 milliseconds for Kyber-1024 on R3080 when achieving the peak throughput. https://tches.iacr.org/index.php/TCHES/article/view/11420Lattice-based CryptographyGPUsTensor CoreKyber
spellingShingle Tian Zhou
Fangyu Zheng
Guang Fan
Lipeng Wan
Wenxu Tang
Yixuan Song
Yi Bian
Jingqiang Lin
ConvKyber: Unleashing the Power of AI Accelerators for Faster Kyber with Novel Iteration-based Approaches
Transactions on Cryptographic Hardware and Embedded Systems
Lattice-based Cryptography
GPUs
Tensor Core
Kyber
title ConvKyber: Unleashing the Power of AI Accelerators for Faster Kyber with Novel Iteration-based Approaches
title_full ConvKyber: Unleashing the Power of AI Accelerators for Faster Kyber with Novel Iteration-based Approaches
title_fullStr ConvKyber: Unleashing the Power of AI Accelerators for Faster Kyber with Novel Iteration-based Approaches
title_full_unstemmed ConvKyber: Unleashing the Power of AI Accelerators for Faster Kyber with Novel Iteration-based Approaches
title_short ConvKyber: Unleashing the Power of AI Accelerators for Faster Kyber with Novel Iteration-based Approaches
title_sort convkyber unleashing the power of ai accelerators for faster kyber with novel iteration based approaches
topic Lattice-based Cryptography
GPUs
Tensor Core
Kyber
url https://tches.iacr.org/index.php/TCHES/article/view/11420
work_keys_str_mv AT tianzhou convkyberunleashingthepowerofaiacceleratorsforfasterkyberwithnoveliterationbasedapproaches
AT fangyuzheng convkyberunleashingthepowerofaiacceleratorsforfasterkyberwithnoveliterationbasedapproaches
AT guangfan convkyberunleashingthepowerofaiacceleratorsforfasterkyberwithnoveliterationbasedapproaches
AT lipengwan convkyberunleashingthepowerofaiacceleratorsforfasterkyberwithnoveliterationbasedapproaches
AT wenxutang convkyberunleashingthepowerofaiacceleratorsforfasterkyberwithnoveliterationbasedapproaches
AT yixuansong convkyberunleashingthepowerofaiacceleratorsforfasterkyberwithnoveliterationbasedapproaches
AT yibian convkyberunleashingthepowerofaiacceleratorsforfasterkyberwithnoveliterationbasedapproaches
AT jingqianglin convkyberunleashingthepowerofaiacceleratorsforfasterkyberwithnoveliterationbasedapproaches