ConvKyber: Unleashing the Power of AI Accelerators for Faster Kyber with Novel Iteration-based Approaches

The remarkable performance capabilities of AI accelerators offer promising opportunities for accelerating cryptographic algorithms, particularly in the context of lattice-based cryptography. However, current approaches to leveraging AI accelerators often remain at a rudimentary level of implementat...

Full description

Bibliographic Details
Main Authors:	Tian Zhou, Fangyu Zheng, Guang Fan, Lipeng Wan, Wenxu Tang, Yixuan Song, Yi Bian, Jingqiang Lin
Format:	Article
Language:	English
Published:	Ruhr-Universität Bochum 2024-03-01
Series:	Transactions on Cryptographic Hardware and Embedded Systems
Subjects:	Lattice-based Cryptography GPUs Tensor Core Kyber
Online Access:	https://tches.iacr.org/index.php/TCHES/article/view/11420

_version_	1797262226701156352
author	Tian Zhou Fangyu Zheng Guang Fan Lipeng Wan Wenxu Tang Yixuan Song Yi Bian Jingqiang Lin
author_facet	Tian Zhou Fangyu Zheng Guang Fan Lipeng Wan Wenxu Tang Yixuan Song Yi Bian Jingqiang Lin
author_sort	Tian Zhou
collection	DOAJ
description	The remarkable performance capabilities of AI accelerators offer promising opportunities for accelerating cryptographic algorithms, particularly in the context of lattice-based cryptography. However, current approaches to leveraging AI accelerators often remain at a rudimentary level of implementation, overlooking the intricate internal mechanisms of these devices. Consequently, a significant number of computational resources is underutilized. In this paper, we present a comprehensive exploration of NVIDIA Tensor Cores and introduce a novel framework tailored specifically for Kyber. Firstly, we propose two innovative approaches that efficiently break down Kyber’s NTT into iterative matrix multiplications, resulting in approximately a 75% reduction in costs compared to the state-of-the-art scanning-based methods. Secondly, by reversing the internal mechanisms, we precisely manipulate the internal resources of Tensor Cores using assembly-level code instead of inefficient standard interfaces, eliminating memory accesses and redundant function calls. Finally, building upon our highly optimized NTT, we provide a complete implementation for all parameter sets of Kyber. Our implementation surpasses the state-of-the-art Tensor Core based work, achieving remarkable speed-ups of 1.93x, 1.65x, 1.22x and 3.55x for polyvec_ntt, KeyGen, Enc and Dec in Kyber-1024, respectively. Even when considering execution latency, our throughput-oriented full Kyber implementation maintains an acceptable execution latency. For instance, the execution latency ranges from 1.02 to 5.68 milliseconds for Kyber-1024 on R3080 when achieving the peak throughput.
first_indexed	2024-04-24T23:53:45Z
format	Article
id	doaj.art-22734d8eaee6410c8372164f3255bbf5
institution	Directory Open Access Journal
issn	2569-2925
language	English
last_indexed	2024-04-24T23:53:45Z
publishDate	2024-03-01
publisher	Ruhr-Universität Bochum
record_format	Article
series	Transactions on Cryptographic Hardware and Embedded Systems
spelling	doaj.art-22734d8eaee6410c8372164f3255bbf52024-03-14T16:24:49ZengRuhr-Universität BochumTransactions on Cryptographic Hardware and Embedded Systems2569-29252024-03-012024210.46586/tches.v2024.i2.25-63ConvKyber: Unleashing the Power of AI Accelerators for Faster Kyber with Novel Iteration-based ApproachesTian Zhou0Fangyu Zheng1Guang Fan2Lipeng Wan3Wenxu Tang4Yixuan Song5Yi Bian6Jingqiang Lin7School of Cyber Security, University of Science and Technology of China, Heifei, ChinaSchool of Cryptology, University of Chinese Academy of Sciences, Beijing, ChinaAnt Group, Hangzhou, ChinaSchool of Cryptology, University of Chinese Academy of Sciences, Beijing, ChinaSchool of Cyber Security, University of Science and Technology of China, Heifei, ChinaAnt Group, Hangzhou, ChinaSchool of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, ChinaSchool of Cyber Security, University of Science and Technology of China, Heifei, China; Beijing Research Institute, University of Science and Technology of China, Beijing, China The remarkable performance capabilities of AI accelerators offer promising opportunities for accelerating cryptographic algorithms, particularly in the context of lattice-based cryptography. However, current approaches to leveraging AI accelerators often remain at a rudimentary level of implementation, overlooking the intricate internal mechanisms of these devices. Consequently, a significant number of computational resources is underutilized. In this paper, we present a comprehensive exploration of NVIDIA Tensor Cores and introduce a novel framework tailored specifically for Kyber. Firstly, we propose two innovative approaches that efficiently break down Kyber’s NTT into iterative matrix multiplications, resulting in approximately a 75% reduction in costs compared to the state-of-the-art scanning-based methods. Secondly, by reversing the internal mechanisms, we precisely manipulate the internal resources of Tensor Cores using assembly-level code instead of inefficient standard interfaces, eliminating memory accesses and redundant function calls. Finally, building upon our highly optimized NTT, we provide a complete implementation for all parameter sets of Kyber. Our implementation surpasses the state-of-the-art Tensor Core based work, achieving remarkable speed-ups of 1.93x, 1.65x, 1.22x and 3.55x for polyvec_ntt, KeyGen, Enc and Dec in Kyber-1024, respectively. Even when considering execution latency, our throughput-oriented full Kyber implementation maintains an acceptable execution latency. For instance, the execution latency ranges from 1.02 to 5.68 milliseconds for Kyber-1024 on R3080 when achieving the peak throughput. https://tches.iacr.org/index.php/TCHES/article/view/11420Lattice-based CryptographyGPUsTensor CoreKyber
spellingShingle	Tian Zhou Fangyu Zheng Guang Fan Lipeng Wan Wenxu Tang Yixuan Song Yi Bian Jingqiang Lin ConvKyber: Unleashing the Power of AI Accelerators for Faster Kyber with Novel Iteration-based Approaches Transactions on Cryptographic Hardware and Embedded Systems Lattice-based Cryptography GPUs Tensor Core Kyber
title	ConvKyber: Unleashing the Power of AI Accelerators for Faster Kyber with Novel Iteration-based Approaches
title_full	ConvKyber: Unleashing the Power of AI Accelerators for Faster Kyber with Novel Iteration-based Approaches
title_fullStr	ConvKyber: Unleashing the Power of AI Accelerators for Faster Kyber with Novel Iteration-based Approaches
title_full_unstemmed	ConvKyber: Unleashing the Power of AI Accelerators for Faster Kyber with Novel Iteration-based Approaches
title_short	ConvKyber: Unleashing the Power of AI Accelerators for Faster Kyber with Novel Iteration-based Approaches
title_sort	convkyber unleashing the power of ai accelerators for faster kyber with novel iteration based approaches
topic	Lattice-based Cryptography GPUs Tensor Core Kyber
url	https://tches.iacr.org/index.php/TCHES/article/view/11420
work_keys_str_mv	AT tianzhou convkyberunleashingthepowerofaiacceleratorsforfasterkyberwithnoveliterationbasedapproaches AT fangyuzheng convkyberunleashingthepowerofaiacceleratorsforfasterkyberwithnoveliterationbasedapproaches AT guangfan convkyberunleashingthepowerofaiacceleratorsforfasterkyberwithnoveliterationbasedapproaches AT lipengwan convkyberunleashingthepowerofaiacceleratorsforfasterkyberwithnoveliterationbasedapproaches AT wenxutang convkyberunleashingthepowerofaiacceleratorsforfasterkyberwithnoveliterationbasedapproaches AT yixuansong convkyberunleashingthepowerofaiacceleratorsforfasterkyberwithnoveliterationbasedapproaches AT yibian convkyberunleashingthepowerofaiacceleratorsforfasterkyberwithnoveliterationbasedapproaches AT jingqianglin convkyberunleashingthepowerofaiacceleratorsforfasterkyberwithnoveliterationbasedapproaches

ConvKyber: Unleashing the Power of AI Accelerators for Faster Kyber with Novel Iteration-based Approaches

Similar Items