Hardware Acceleration of Number Theoretic Transform in zk-SNARK

The proof in zk-SNARK has a fixed length and can be verified quickly, promoting the application of zero-knowledge proof in areas such as digital signature, blockchain, distributed storage, and outsourced computing. However, the generation of proofs is time-consuming and frequently used. As a result,...

Full description

Bibliographic Details
Main Author:	ZHAO Haixu, CHAI Zhilei, HUA Pengcheng, WANG Feng, DING Dong
Format:	Article
Language:	zho
Published:	Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press 2024-02-01
Series:	Jisuanji kexue yu tansuo
Subjects:	field programmable gate array (fpga); zero-knowledge succinct non-interactive arguments of knowledge (zk-snark); modular multiplication; number theoretic transform; hardware acceleration
Online Access:	http://fcst.ceaj.org/fileup/1673-9418/PDF/2211075.pdf

_version_	1827364784290398208
author	ZHAO Haixu, CHAI Zhilei, HUA Pengcheng, WANG Feng, DING Dong
author_facet	ZHAO Haixu, CHAI Zhilei, HUA Pengcheng, WANG Feng, DING Dong
author_sort	ZHAO Haixu, CHAI Zhilei, HUA Pengcheng, WANG Feng, DING Dong
collection	DOAJ
description	The proof in zk-SNARK has a fixed length and can be verified quickly, promoting the application of zero-knowledge proof in areas such as digital signature, blockchain, distributed storage, and outsourced computing. However, the generation of proofs is time-consuming and frequently used. As a result, NTT (number theoretic transform), one of the most time-consuming parts in proof-generation, needs to be accelerated significantly. However, the existing general NTT hardware acceleration methods cannot meet the requirements of large-bitwidth and large-scale in zk-SNARK. To address this issue, this paper proposes a highly pipelined architecture for NTT. First of all, large-bitwidth modular arithmetic is optimized and low-latency Montgomery modular multiplication hardware unit is designed. And then, the large-scale NTT tasks are divided into smaller sub-tasks through two-dimensional partitioning, which improves the parallelism of NTT computation and eliminates the data dependence among sub-tasks, thus reali-zing the pipeline among sub-tasks. Finally, the “data reordering” technique is introduced among multiple rounds of butterfly operations in a sub-task, which effectively alleviates the memory access requirements, thus realizing the bottom-level pipeline in each sub-task, among butterfly operations with different step sizes. This architecture can be flexibly scaled to different scales of FPGAs. The accelerator is prototyped on the AMD-Xilinx Alveo U50 card (UltraScale+XCU50 FPGA). To balance computing efficiency and flexibility, the OpenCL equipped with high-level synthesis (HLS) is used to implement the system. The evaluation results show that the NTT module performs 1.95 times faster than the one in PipeZK and the accelerator achieves 27.98 and 1.74 times speedup, 6.9 and 6 times energy efficiency improvement than AMD Ryzen 9 5900X respectively, when it is integrated into the well-known ZKP open-source project, bellman.
first_indexed	2024-03-08T08:17:18Z
format	Article
id	doaj.art-58d9965dbd1b406b9f47534fc728995a
institution	Directory Open Access Journal
issn	1673-9418
language	zho
last_indexed	2024-03-08T08:17:18Z
publishDate	2024-02-01
publisher	Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press
record_format	Article
series	Jisuanji kexue yu tansuo
spelling	doaj.art-58d9965dbd1b406b9f47534fc728995a2024-02-02T07:10:09ZzhoJournal of Computer Engineering and Applications Beijing Co., Ltd., Science PressJisuanji kexue yu tansuo1673-94182024-02-0118253855210.3778/j.issn.1673-9418.2211075Hardware Acceleration of Number Theoretic Transform in zk-SNARKZHAO Haixu, CHAI Zhilei, HUA Pengcheng, WANG Feng, DING Dong01. School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu 214122, China 2. School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China 3. Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Wuxi, Jiangsu 214122, ChinaThe proof in zk-SNARK has a fixed length and can be verified quickly, promoting the application of zero-knowledge proof in areas such as digital signature, blockchain, distributed storage, and outsourced computing. However, the generation of proofs is time-consuming and frequently used. As a result, NTT (number theoretic transform), one of the most time-consuming parts in proof-generation, needs to be accelerated significantly. However, the existing general NTT hardware acceleration methods cannot meet the requirements of large-bitwidth and large-scale in zk-SNARK. To address this issue, this paper proposes a highly pipelined architecture for NTT. First of all, large-bitwidth modular arithmetic is optimized and low-latency Montgomery modular multiplication hardware unit is designed. And then, the large-scale NTT tasks are divided into smaller sub-tasks through two-dimensional partitioning, which improves the parallelism of NTT computation and eliminates the data dependence among sub-tasks, thus reali-zing the pipeline among sub-tasks. Finally, the “data reordering” technique is introduced among multiple rounds of butterfly operations in a sub-task, which effectively alleviates the memory access requirements, thus realizing the bottom-level pipeline in each sub-task, among butterfly operations with different step sizes. This architecture can be flexibly scaled to different scales of FPGAs. The accelerator is prototyped on the AMD-Xilinx Alveo U50 card (UltraScale+XCU50 FPGA). To balance computing efficiency and flexibility, the OpenCL equipped with high-level synthesis (HLS) is used to implement the system. The evaluation results show that the NTT module performs 1.95 times faster than the one in PipeZK and the accelerator achieves 27.98 and 1.74 times speedup, 6.9 and 6 times energy efficiency improvement than AMD Ryzen 9 5900X respectively, when it is integrated into the well-known ZKP open-source project, bellman.http://fcst.ceaj.org/fileup/1673-9418/PDF/2211075.pdffield programmable gate array (fpga); zero-knowledge succinct non-interactive arguments of knowledge (zk-snark); modular multiplication; number theoretic transform; hardware acceleration
spellingShingle	ZHAO Haixu, CHAI Zhilei, HUA Pengcheng, WANG Feng, DING Dong Hardware Acceleration of Number Theoretic Transform in zk-SNARK Jisuanji kexue yu tansuo field programmable gate array (fpga); zero-knowledge succinct non-interactive arguments of knowledge (zk-snark); modular multiplication; number theoretic transform; hardware acceleration
title	Hardware Acceleration of Number Theoretic Transform in zk-SNARK
title_full	Hardware Acceleration of Number Theoretic Transform in zk-SNARK
title_fullStr	Hardware Acceleration of Number Theoretic Transform in zk-SNARK
title_full_unstemmed	Hardware Acceleration of Number Theoretic Transform in zk-SNARK
title_short	Hardware Acceleration of Number Theoretic Transform in zk-SNARK
title_sort	hardware acceleration of number theoretic transform in zk snark
topic	field programmable gate array (fpga); zero-knowledge succinct non-interactive arguments of knowledge (zk-snark); modular multiplication; number theoretic transform; hardware acceleration
url	http://fcst.ceaj.org/fileup/1673-9418/PDF/2211075.pdf
work_keys_str_mv	AT zhaohaixuchaizhileihuapengchengwangfengdingdong hardwareaccelerationofnumbertheoretictransforminzksnark

Hardware Acceleration of Number Theoretic Transform in zk-SNARK

Similar Items