A Highly-efficient Lattice-based Post-Quantum Cryptography Processor for IoT Applications
Lattice-Based Cryptography (LBC) schemes, like CRYSTALS-Kyber and CRYSTALS-Dilithium, have been selected to be standardized in the NIST Post-Quantum Cryptography standard. However, implementing these schemes in resourceconstrained Internet-of-Things (IoT) devices is challenging, considering efficie...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Ruhr-Universität Bochum
2024-03-01
|
Series: | Transactions on Cryptographic Hardware and Embedded Systems |
Subjects: | |
Online Access: | https://tches.iacr.org/index.php/TCHES/article/view/11423 |
_version_ | 1827318138784448512 |
---|---|
author | Zewen Ye Ruibing Song Hao Zhang Donglong Chen Ray Chak-Chung Cheung Kejie Huang |
author_facet | Zewen Ye Ruibing Song Hao Zhang Donglong Chen Ray Chak-Chung Cheung Kejie Huang |
author_sort | Zewen Ye |
collection | DOAJ |
description |
Lattice-Based Cryptography (LBC) schemes, like CRYSTALS-Kyber and CRYSTALS-Dilithium, have been selected to be standardized in the NIST Post-Quantum Cryptography standard. However, implementing these schemes in resourceconstrained Internet-of-Things (IoT) devices is challenging, considering efficiency, power consumption, area overhead, and flexibility to support various operations and parameter settings. Some existing ASIC designs that prioritize lower power and area can not achieve optimal performance efficiency, which are not practical for battery-powered devices. Custom hardware accelerators in prior co-processor and processor designs have limited applications and flexibility, incurring significant area and power overheads for IoT devices. To address these challenges, this paper presents an efficient lattice-based cryptography processor with customized Single-Instruction-Multiple-Data (SIMD) instruction. First, our proposed SIMD architecture supports efficient parallel execution of various polynomial operations in 256-bit mode and acceleration of Keccak in 320-bit mode, both utilizing efficiently reused resources. Additionally, we introduce data shuffling hardware units to resolve data dependencies within SIMD data. To further enhance performance, we design a dual-issue path for memory accesses and corresponding software design methodologies to reduce the impact of data load/store blocking. Through a hardware/software co-design approach, our proposed processor achieves high efficiency in supporting all operations in lattice-based cryptography schemes. Evaluations of Kyber and Dilithium show our proposed processor achieves over 10x speedup compared with the baseline RISC-V processor and over 5x speedup versus ARM Cortex M4 implementations, making it a promising solution for securing IoT communications and storage. Moreover, Silicon synthesis results show our design can run at 200 MHz with 2.01 mW for Kyber KEM 512 and 2.13 mW for Dilithium 2, which outperforms state-of-the-art works in terms of PPAP (Performance x Power x Area).
|
first_indexed | 2024-04-24T23:53:49Z |
format | Article |
id | doaj.art-3539585c6bd741c3a5a9ec83345c2557 |
institution | Directory Open Access Journal |
issn | 2569-2925 |
language | English |
last_indexed | 2024-04-24T23:53:49Z |
publishDate | 2024-03-01 |
publisher | Ruhr-Universität Bochum |
record_format | Article |
series | Transactions on Cryptographic Hardware and Embedded Systems |
spelling | doaj.art-3539585c6bd741c3a5a9ec83345c25572024-03-14T16:24:49ZengRuhr-Universität BochumTransactions on Cryptographic Hardware and Embedded Systems2569-29252024-03-012024210.46586/tches.v2024.i2.130-153A Highly-efficient Lattice-based Post-Quantum Cryptography Processor for IoT ApplicationsZewen Ye0Ruibing Song1Hao Zhang2Donglong Chen3Ray Chak-Chung Cheung4Kejie Huang5Zhejiang University, Hangzhou, China; City University of Hong Kong, Hong Kong, ChinaZhejiang University, Hangzhou, ChinaZhejiang University, Hangzhou, ChinaBNU-HKBU United International College, Zhuhai, ChinaCity University of Hong Kong, Hong Kong, ChinaZhejiang University, Hangzhou, China Lattice-Based Cryptography (LBC) schemes, like CRYSTALS-Kyber and CRYSTALS-Dilithium, have been selected to be standardized in the NIST Post-Quantum Cryptography standard. However, implementing these schemes in resourceconstrained Internet-of-Things (IoT) devices is challenging, considering efficiency, power consumption, area overhead, and flexibility to support various operations and parameter settings. Some existing ASIC designs that prioritize lower power and area can not achieve optimal performance efficiency, which are not practical for battery-powered devices. Custom hardware accelerators in prior co-processor and processor designs have limited applications and flexibility, incurring significant area and power overheads for IoT devices. To address these challenges, this paper presents an efficient lattice-based cryptography processor with customized Single-Instruction-Multiple-Data (SIMD) instruction. First, our proposed SIMD architecture supports efficient parallel execution of various polynomial operations in 256-bit mode and acceleration of Keccak in 320-bit mode, both utilizing efficiently reused resources. Additionally, we introduce data shuffling hardware units to resolve data dependencies within SIMD data. To further enhance performance, we design a dual-issue path for memory accesses and corresponding software design methodologies to reduce the impact of data load/store blocking. Through a hardware/software co-design approach, our proposed processor achieves high efficiency in supporting all operations in lattice-based cryptography schemes. Evaluations of Kyber and Dilithium show our proposed processor achieves over 10x speedup compared with the baseline RISC-V processor and over 5x speedup versus ARM Cortex M4 implementations, making it a promising solution for securing IoT communications and storage. Moreover, Silicon synthesis results show our design can run at 200 MHz with 2.01 mW for Kyber KEM 512 and 2.13 mW for Dilithium 2, which outperforms state-of-the-art works in terms of PPAP (Performance x Power x Area). https://tches.iacr.org/index.php/TCHES/article/view/11423Post-quantum CryptographyRISC-VSingle-Instruction-Multiple- DataLattice-Based CryptographyInternet-of-Things |
spellingShingle | Zewen Ye Ruibing Song Hao Zhang Donglong Chen Ray Chak-Chung Cheung Kejie Huang A Highly-efficient Lattice-based Post-Quantum Cryptography Processor for IoT Applications Transactions on Cryptographic Hardware and Embedded Systems Post-quantum Cryptography RISC-V Single-Instruction-Multiple- Data Lattice-Based Cryptography Internet-of-Things |
title | A Highly-efficient Lattice-based Post-Quantum Cryptography Processor for IoT Applications |
title_full | A Highly-efficient Lattice-based Post-Quantum Cryptography Processor for IoT Applications |
title_fullStr | A Highly-efficient Lattice-based Post-Quantum Cryptography Processor for IoT Applications |
title_full_unstemmed | A Highly-efficient Lattice-based Post-Quantum Cryptography Processor for IoT Applications |
title_short | A Highly-efficient Lattice-based Post-Quantum Cryptography Processor for IoT Applications |
title_sort | highly efficient lattice based post quantum cryptography processor for iot applications |
topic | Post-quantum Cryptography RISC-V Single-Instruction-Multiple- Data Lattice-Based Cryptography Internet-of-Things |
url | https://tches.iacr.org/index.php/TCHES/article/view/11423 |
work_keys_str_mv | AT zewenye ahighlyefficientlatticebasedpostquantumcryptographyprocessorforiotapplications AT ruibingsong ahighlyefficientlatticebasedpostquantumcryptographyprocessorforiotapplications AT haozhang ahighlyefficientlatticebasedpostquantumcryptographyprocessorforiotapplications AT donglongchen ahighlyefficientlatticebasedpostquantumcryptographyprocessorforiotapplications AT raychakchungcheung ahighlyefficientlatticebasedpostquantumcryptographyprocessorforiotapplications AT kejiehuang ahighlyefficientlatticebasedpostquantumcryptographyprocessorforiotapplications AT zewenye highlyefficientlatticebasedpostquantumcryptographyprocessorforiotapplications AT ruibingsong highlyefficientlatticebasedpostquantumcryptographyprocessorforiotapplications AT haozhang highlyefficientlatticebasedpostquantumcryptographyprocessorforiotapplications AT donglongchen highlyefficientlatticebasedpostquantumcryptographyprocessorforiotapplications AT raychakchungcheung highlyefficientlatticebasedpostquantumcryptographyprocessorforiotapplications AT kejiehuang highlyefficientlatticebasedpostquantumcryptographyprocessorforiotapplications |