Hardware Acceleration of Number Theoretic Transform in zk-SNARK

The proof in zk-SNARK has a fixed length and can be verified quickly, promoting the application of zero-knowledge proof in areas such as digital signature, blockchain, distributed storage, and outsourced computing. However, the generation of proofs is time-consuming and frequently used. As a result,...

Full description

Bibliographic Details
Main Author: ZHAO Haixu, CHAI Zhilei, HUA Pengcheng, WANG Feng, DING Dong
Format: Article
Language:zho
Published: Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press 2024-02-01
Series:Jisuanji kexue yu tansuo
Subjects:
Online Access:http://fcst.ceaj.org/fileup/1673-9418/PDF/2211075.pdf
_version_ 1827364784290398208
author ZHAO Haixu, CHAI Zhilei, HUA Pengcheng, WANG Feng, DING Dong
author_facet ZHAO Haixu, CHAI Zhilei, HUA Pengcheng, WANG Feng, DING Dong
author_sort ZHAO Haixu, CHAI Zhilei, HUA Pengcheng, WANG Feng, DING Dong
collection DOAJ
description The proof in zk-SNARK has a fixed length and can be verified quickly, promoting the application of zero-knowledge proof in areas such as digital signature, blockchain, distributed storage, and outsourced computing. However, the generation of proofs is time-consuming and frequently used. As a result, NTT (number theoretic transform), one of the most time-consuming parts in proof-generation, needs to be accelerated significantly. However, the existing general NTT hardware acceleration methods cannot meet the requirements of large-bitwidth and large-scale in zk-SNARK. To address this issue, this paper proposes a highly pipelined architecture for NTT. First of all, large-bitwidth modular arithmetic is optimized and low-latency Montgomery modular multiplication hardware unit is designed. And then, the large-scale NTT tasks are divided into smaller sub-tasks through two-dimensional partitioning, which improves the parallelism of NTT computation and eliminates the data dependence among sub-tasks, thus reali-zing the pipeline among sub-tasks. Finally, the “data reordering” technique is introduced among multiple rounds of butterfly operations in a sub-task, which effectively alleviates the memory access requirements, thus realizing the bottom-level pipeline in each sub-task, among butterfly operations with different step sizes. This architecture can be flexibly scaled to different scales of FPGAs. The accelerator is prototyped on the AMD-Xilinx Alveo U50 card (UltraScale+XCU50 FPGA). To balance computing efficiency and flexibility, the OpenCL equipped with high-level synthesis (HLS) is used to implement the system. The evaluation results show that the NTT module performs 1.95 times faster than the one in PipeZK and the accelerator achieves 27.98 and 1.74 times speedup, 6.9 and 6 times energy efficiency improvement than AMD Ryzen 9 5900X respectively, when it is integrated into the well-known ZKP open-source project, bellman.
first_indexed 2024-03-08T08:17:18Z
format Article
id doaj.art-58d9965dbd1b406b9f47534fc728995a
institution Directory Open Access Journal
issn 1673-9418
language zho
last_indexed 2024-03-08T08:17:18Z
publishDate 2024-02-01
publisher Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press
record_format Article
series Jisuanji kexue yu tansuo
spelling doaj.art-58d9965dbd1b406b9f47534fc728995a2024-02-02T07:10:09ZzhoJournal of Computer Engineering and Applications Beijing Co., Ltd., Science PressJisuanji kexue yu tansuo1673-94182024-02-0118253855210.3778/j.issn.1673-9418.2211075Hardware Acceleration of Number Theoretic Transform in zk-SNARKZHAO Haixu, CHAI Zhilei, HUA Pengcheng, WANG Feng, DING Dong01. School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu 214122, China 2. School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China 3. Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Wuxi, Jiangsu 214122, ChinaThe proof in zk-SNARK has a fixed length and can be verified quickly, promoting the application of zero-knowledge proof in areas such as digital signature, blockchain, distributed storage, and outsourced computing. However, the generation of proofs is time-consuming and frequently used. As a result, NTT (number theoretic transform), one of the most time-consuming parts in proof-generation, needs to be accelerated significantly. However, the existing general NTT hardware acceleration methods cannot meet the requirements of large-bitwidth and large-scale in zk-SNARK. To address this issue, this paper proposes a highly pipelined architecture for NTT. First of all, large-bitwidth modular arithmetic is optimized and low-latency Montgomery modular multiplication hardware unit is designed. And then, the large-scale NTT tasks are divided into smaller sub-tasks through two-dimensional partitioning, which improves the parallelism of NTT computation and eliminates the data dependence among sub-tasks, thus reali-zing the pipeline among sub-tasks. Finally, the “data reordering” technique is introduced among multiple rounds of butterfly operations in a sub-task, which effectively alleviates the memory access requirements, thus realizing the bottom-level pipeline in each sub-task, among butterfly operations with different step sizes. This architecture can be flexibly scaled to different scales of FPGAs. The accelerator is prototyped on the AMD-Xilinx Alveo U50 card (UltraScale+XCU50 FPGA). To balance computing efficiency and flexibility, the OpenCL equipped with high-level synthesis (HLS) is used to implement the system. The evaluation results show that the NTT module performs 1.95 times faster than the one in PipeZK and the accelerator achieves 27.98 and 1.74 times speedup, 6.9 and 6 times energy efficiency improvement than AMD Ryzen 9 5900X respectively, when it is integrated into the well-known ZKP open-source project, bellman.http://fcst.ceaj.org/fileup/1673-9418/PDF/2211075.pdffield programmable gate array (fpga); zero-knowledge succinct non-interactive arguments of knowledge (zk-snark); modular multiplication; number theoretic transform; hardware acceleration
spellingShingle ZHAO Haixu, CHAI Zhilei, HUA Pengcheng, WANG Feng, DING Dong
Hardware Acceleration of Number Theoretic Transform in zk-SNARK
Jisuanji kexue yu tansuo
field programmable gate array (fpga); zero-knowledge succinct non-interactive arguments of knowledge (zk-snark); modular multiplication; number theoretic transform; hardware acceleration
title Hardware Acceleration of Number Theoretic Transform in zk-SNARK
title_full Hardware Acceleration of Number Theoretic Transform in zk-SNARK
title_fullStr Hardware Acceleration of Number Theoretic Transform in zk-SNARK
title_full_unstemmed Hardware Acceleration of Number Theoretic Transform in zk-SNARK
title_short Hardware Acceleration of Number Theoretic Transform in zk-SNARK
title_sort hardware acceleration of number theoretic transform in zk snark
topic field programmable gate array (fpga); zero-knowledge succinct non-interactive arguments of knowledge (zk-snark); modular multiplication; number theoretic transform; hardware acceleration
url http://fcst.ceaj.org/fileup/1673-9418/PDF/2211075.pdf
work_keys_str_mv AT zhaohaixuchaizhileihuapengchengwangfengdingdong hardwareaccelerationofnumbertheoretictransforminzksnark