A Highly-efficient Lattice-based Post-Quantum Cryptography Processor for IoT Applications

Lattice-Based Cryptography (LBC) schemes, like CRYSTALS-Kyber and CRYSTALS-Dilithium, have been selected to be standardized in the NIST Post-Quantum Cryptography standard. However, implementing these schemes in resourceconstrained Internet-of-Things (IoT) devices is challenging, considering efficie...

Full description

Bibliographic Details
Main Authors: Zewen Ye, Ruibing Song, Hao Zhang, Donglong Chen, Ray Chak-Chung Cheung, Kejie Huang
Format: Article
Language:English
Published: Ruhr-Universität Bochum 2024-03-01
Series:Transactions on Cryptographic Hardware and Embedded Systems
Subjects:
Online Access:https://tches.iacr.org/index.php/TCHES/article/view/11423
_version_ 1827318138784448512
author Zewen Ye
Ruibing Song
Hao Zhang
Donglong Chen
Ray Chak-Chung Cheung
Kejie Huang
author_facet Zewen Ye
Ruibing Song
Hao Zhang
Donglong Chen
Ray Chak-Chung Cheung
Kejie Huang
author_sort Zewen Ye
collection DOAJ
description Lattice-Based Cryptography (LBC) schemes, like CRYSTALS-Kyber and CRYSTALS-Dilithium, have been selected to be standardized in the NIST Post-Quantum Cryptography standard. However, implementing these schemes in resourceconstrained Internet-of-Things (IoT) devices is challenging, considering efficiency, power consumption, area overhead, and flexibility to support various operations and parameter settings. Some existing ASIC designs that prioritize lower power and area can not achieve optimal performance efficiency, which are not practical for battery-powered devices. Custom hardware accelerators in prior co-processor and processor designs have limited applications and flexibility, incurring significant area and power overheads for IoT devices. To address these challenges, this paper presents an efficient lattice-based cryptography processor with customized Single-Instruction-Multiple-Data (SIMD) instruction. First, our proposed SIMD architecture supports efficient parallel execution of various polynomial operations in 256-bit mode and acceleration of Keccak in 320-bit mode, both utilizing efficiently reused resources. Additionally, we introduce data shuffling hardware units to resolve data dependencies within SIMD data. To further enhance performance, we design a dual-issue path for memory accesses and corresponding software design methodologies to reduce the impact of data load/store blocking. Through a hardware/software co-design approach, our proposed processor achieves high efficiency in supporting all operations in lattice-based cryptography schemes. Evaluations of Kyber and Dilithium show our proposed processor achieves over 10x speedup compared with the baseline RISC-V processor and over 5x speedup versus ARM Cortex M4 implementations, making it a promising solution for securing IoT communications and storage. Moreover, Silicon synthesis results show our design can run at 200 MHz with 2.01 mW for Kyber KEM 512 and 2.13 mW for Dilithium 2, which outperforms state-of-the-art works in terms of PPAP (Performance x Power x Area).
first_indexed 2024-04-24T23:53:49Z
format Article
id doaj.art-3539585c6bd741c3a5a9ec83345c2557
institution Directory Open Access Journal
issn 2569-2925
language English
last_indexed 2024-04-24T23:53:49Z
publishDate 2024-03-01
publisher Ruhr-Universität Bochum
record_format Article
series Transactions on Cryptographic Hardware and Embedded Systems
spelling doaj.art-3539585c6bd741c3a5a9ec83345c25572024-03-14T16:24:49ZengRuhr-Universität BochumTransactions on Cryptographic Hardware and Embedded Systems2569-29252024-03-012024210.46586/tches.v2024.i2.130-153A Highly-efficient Lattice-based Post-Quantum Cryptography Processor for IoT ApplicationsZewen Ye0Ruibing Song1Hao Zhang2Donglong Chen3Ray Chak-Chung Cheung4Kejie Huang5Zhejiang University, Hangzhou, China; City University of Hong Kong, Hong Kong, ChinaZhejiang University, Hangzhou, ChinaZhejiang University, Hangzhou, ChinaBNU-HKBU United International College, Zhuhai, ChinaCity University of Hong Kong, Hong Kong, ChinaZhejiang University, Hangzhou, China Lattice-Based Cryptography (LBC) schemes, like CRYSTALS-Kyber and CRYSTALS-Dilithium, have been selected to be standardized in the NIST Post-Quantum Cryptography standard. However, implementing these schemes in resourceconstrained Internet-of-Things (IoT) devices is challenging, considering efficiency, power consumption, area overhead, and flexibility to support various operations and parameter settings. Some existing ASIC designs that prioritize lower power and area can not achieve optimal performance efficiency, which are not practical for battery-powered devices. Custom hardware accelerators in prior co-processor and processor designs have limited applications and flexibility, incurring significant area and power overheads for IoT devices. To address these challenges, this paper presents an efficient lattice-based cryptography processor with customized Single-Instruction-Multiple-Data (SIMD) instruction. First, our proposed SIMD architecture supports efficient parallel execution of various polynomial operations in 256-bit mode and acceleration of Keccak in 320-bit mode, both utilizing efficiently reused resources. Additionally, we introduce data shuffling hardware units to resolve data dependencies within SIMD data. To further enhance performance, we design a dual-issue path for memory accesses and corresponding software design methodologies to reduce the impact of data load/store blocking. Through a hardware/software co-design approach, our proposed processor achieves high efficiency in supporting all operations in lattice-based cryptography schemes. Evaluations of Kyber and Dilithium show our proposed processor achieves over 10x speedup compared with the baseline RISC-V processor and over 5x speedup versus ARM Cortex M4 implementations, making it a promising solution for securing IoT communications and storage. Moreover, Silicon synthesis results show our design can run at 200 MHz with 2.01 mW for Kyber KEM 512 and 2.13 mW for Dilithium 2, which outperforms state-of-the-art works in terms of PPAP (Performance x Power x Area). https://tches.iacr.org/index.php/TCHES/article/view/11423Post-quantum CryptographyRISC-VSingle-Instruction-Multiple- DataLattice-Based CryptographyInternet-of-Things
spellingShingle Zewen Ye
Ruibing Song
Hao Zhang
Donglong Chen
Ray Chak-Chung Cheung
Kejie Huang
A Highly-efficient Lattice-based Post-Quantum Cryptography Processor for IoT Applications
Transactions on Cryptographic Hardware and Embedded Systems
Post-quantum Cryptography
RISC-V
Single-Instruction-Multiple- Data
Lattice-Based Cryptography
Internet-of-Things
title A Highly-efficient Lattice-based Post-Quantum Cryptography Processor for IoT Applications
title_full A Highly-efficient Lattice-based Post-Quantum Cryptography Processor for IoT Applications
title_fullStr A Highly-efficient Lattice-based Post-Quantum Cryptography Processor for IoT Applications
title_full_unstemmed A Highly-efficient Lattice-based Post-Quantum Cryptography Processor for IoT Applications
title_short A Highly-efficient Lattice-based Post-Quantum Cryptography Processor for IoT Applications
title_sort highly efficient lattice based post quantum cryptography processor for iot applications
topic Post-quantum Cryptography
RISC-V
Single-Instruction-Multiple- Data
Lattice-Based Cryptography
Internet-of-Things
url https://tches.iacr.org/index.php/TCHES/article/view/11423
work_keys_str_mv AT zewenye ahighlyefficientlatticebasedpostquantumcryptographyprocessorforiotapplications
AT ruibingsong ahighlyefficientlatticebasedpostquantumcryptographyprocessorforiotapplications
AT haozhang ahighlyefficientlatticebasedpostquantumcryptographyprocessorforiotapplications
AT donglongchen ahighlyefficientlatticebasedpostquantumcryptographyprocessorforiotapplications
AT raychakchungcheung ahighlyefficientlatticebasedpostquantumcryptographyprocessorforiotapplications
AT kejiehuang ahighlyefficientlatticebasedpostquantumcryptographyprocessorforiotapplications
AT zewenye highlyefficientlatticebasedpostquantumcryptographyprocessorforiotapplications
AT ruibingsong highlyefficientlatticebasedpostquantumcryptographyprocessorforiotapplications
AT haozhang highlyefficientlatticebasedpostquantumcryptographyprocessorforiotapplications
AT donglongchen highlyefficientlatticebasedpostquantumcryptographyprocessorforiotapplications
AT raychakchungcheung highlyefficientlatticebasedpostquantumcryptographyprocessorforiotapplications
AT kejiehuang highlyefficientlatticebasedpostquantumcryptographyprocessorforiotapplications