Fast Implementation of SHA-3 in GPU Environment
Recently, Graphic Processing Units (GPUs) have been widely used for general purpose applications such as machine learning applications, acceleration of cryptographic applications (especially, blockchains), etc. The development of CUDA makes this General-Purpose computing on GPU possible. In particul...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2021-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9585122/ |
_version_ | 1818826079240978432 |
---|---|
author | Hojin Choi Seog Chung Seo |
author_facet | Hojin Choi Seog Chung Seo |
author_sort | Hojin Choi |
collection | DOAJ |
description | Recently, Graphic Processing Units (GPUs) have been widely used for general purpose applications such as machine learning applications, acceleration of cryptographic applications (especially, blockchains), etc. The development of CUDA makes this General-Purpose computing on GPU possible. In particular, currently GPU technology has been widely used for server-side applications so as to provide fast and efficient service to a number of clients. In other words, servers need to process a large amount of user data and execute authentication process. Verifying the integrity of transmitted data is essential for ensuring that the data is not modified during transmission. Hash functions are the cryptographic algorithm which can verify the integrity of data and there are SHA-1, SHA-2, and SHA-3 standard hash functions. In 2015, Keccak algorithm was selected for SHA-3 competition by NIST. However, until now, software implementations of SHA-3 have not provided enough performance for various applications. In addition, SHA-3 and SHAKE using SHA-3 are being used in many Post-Quantum Cryptosystems (PQC) submitted to NIST PQC competition. Therefore, SHA-3 optimization research is required in the software environment. We propose an optimized SHA-3 software implementation on GPU environment. For performance efficiency, we propose several techniques including optimization of SHA-3 internal process, inline PTX optimization, optimized memory usage, and the application of asynchronous CUDA stream. As a result of applying the proposed optimization method, our SHA-3(512) (resp. SHA-3(256)) implementation without CUDA stream provides a maximum throughput of 88.51 Gb/s (resp. 171.62 Gb/s) on RTX2080Ti GPU. Furthermore, without the application of CUDA stream, our SHA-3(512) software on GTX1070 provides about 49.73% improved throughput compared with the previous best work on GTX1080, which shows the superiority of our proposed optimization methods. Our optimized SHA-3 software on GPU can be efficiently used for block-chain applications and several PQCs (especially, key generation process in Lattice-based cryptosystems). |
first_indexed | 2024-12-19T00:21:57Z |
format | Article |
id | doaj.art-04dc217ff4864ed68ffb285f866b75ea |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-19T00:21:57Z |
publishDate | 2021-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-04dc217ff4864ed68ffb285f866b75ea2022-12-21T20:45:28ZengIEEEIEEE Access2169-35362021-01-01914457414458610.1109/ACCESS.2021.31224669585122Fast Implementation of SHA-3 in GPU EnvironmentHojin Choi0https://orcid.org/0000-0002-7298-3689Seog Chung Seo1https://orcid.org/0000-0001-8016-2808Department of Financial Information Security, Kookmin University, Seoul, South KoreaDepartment of Financial Information Security, Kookmin University, Seoul, South KoreaRecently, Graphic Processing Units (GPUs) have been widely used for general purpose applications such as machine learning applications, acceleration of cryptographic applications (especially, blockchains), etc. The development of CUDA makes this General-Purpose computing on GPU possible. In particular, currently GPU technology has been widely used for server-side applications so as to provide fast and efficient service to a number of clients. In other words, servers need to process a large amount of user data and execute authentication process. Verifying the integrity of transmitted data is essential for ensuring that the data is not modified during transmission. Hash functions are the cryptographic algorithm which can verify the integrity of data and there are SHA-1, SHA-2, and SHA-3 standard hash functions. In 2015, Keccak algorithm was selected for SHA-3 competition by NIST. However, until now, software implementations of SHA-3 have not provided enough performance for various applications. In addition, SHA-3 and SHAKE using SHA-3 are being used in many Post-Quantum Cryptosystems (PQC) submitted to NIST PQC competition. Therefore, SHA-3 optimization research is required in the software environment. We propose an optimized SHA-3 software implementation on GPU environment. For performance efficiency, we propose several techniques including optimization of SHA-3 internal process, inline PTX optimization, optimized memory usage, and the application of asynchronous CUDA stream. As a result of applying the proposed optimization method, our SHA-3(512) (resp. SHA-3(256)) implementation without CUDA stream provides a maximum throughput of 88.51 Gb/s (resp. 171.62 Gb/s) on RTX2080Ti GPU. Furthermore, without the application of CUDA stream, our SHA-3(512) software on GTX1070 provides about 49.73% improved throughput compared with the previous best work on GTX1080, which shows the superiority of our proposed optimization methods. Our optimized SHA-3 software on GPU can be efficiently used for block-chain applications and several PQCs (especially, key generation process in Lattice-based cryptosystems).https://ieeexplore.ieee.org/document/9585122/Graphic Processing Unit (GPU)secure hash functionSecure Hash Algorithm (SHA)-3software optimizationNVIDIA CUDAparallel processing |
spellingShingle | Hojin Choi Seog Chung Seo Fast Implementation of SHA-3 in GPU Environment IEEE Access Graphic Processing Unit (GPU) secure hash function Secure Hash Algorithm (SHA)-3 software optimization NVIDIA CUDA parallel processing |
title | Fast Implementation of SHA-3 in GPU Environment |
title_full | Fast Implementation of SHA-3 in GPU Environment |
title_fullStr | Fast Implementation of SHA-3 in GPU Environment |
title_full_unstemmed | Fast Implementation of SHA-3 in GPU Environment |
title_short | Fast Implementation of SHA-3 in GPU Environment |
title_sort | fast implementation of sha 3 in gpu environment |
topic | Graphic Processing Unit (GPU) secure hash function Secure Hash Algorithm (SHA)-3 software optimization NVIDIA CUDA parallel processing |
url | https://ieeexplore.ieee.org/document/9585122/ |
work_keys_str_mv | AT hojinchoi fastimplementationofsha3ingpuenvironment AT seogchungseo fastimplementationofsha3ingpuenvironment |