EC-Crypto: Highly Efficient Area-Delay Optimized Elliptic Curve Cryptography Processor

Elliptic Curve Cryptography (ECC) based security protocols require much shorter key space which makes ECC the most suitable option for resource-limited devices as compared to the other public key cryptography (PKC) schemes. This paper presents a highly efficient area-delay optimized ECC crypto proce...

Full description

Bibliographic Details
Main Authors: Khalid Javeed, Ali El-Moursy, David Gregg
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10143541/
Description
Summary:Elliptic Curve Cryptography (ECC) based security protocols require much shorter key space which makes ECC the most suitable option for resource-limited devices as compared to the other public key cryptography (PKC) schemes. This paper presents a highly efficient area-delay optimized ECC crypto processor over the general prime field (<inline-formula> <tex-math notation="LaTeX">$\mathbb {F}_{p}$ </tex-math></inline-formula>). It is structured on a new novel finite field multiplier (FFM) where several optimization techniques have been incorporated to shorten the latency and hardware resource consumption. The proposed FFM architecture is embedded with a finite field adder/subtractor (FFAS) unit which is utilized to perform FFAS operations instead of deploying a dedicated unit. The Common Z (Co-Z) coordinates with the Montgomery ladder method are used to compute point multiplication, a core operation in all ECC-based crypto protocols. The work also proposes an efficient scheduling strategy to execute low-level finite field arithmetic primitives with minimum latency on the employed finite field arithmetic units. Due to these techniques, the proposed ECC processor is optimized for hardware resources, latency, and throughput. It is captured in Verilog-HDL, synthesized, and implemented on Virtex-7, Kintex-7, and Virtex-6 FPGA platforms using Xilinx Vivado and ISE Design Suite tools. On the Virtex-7 FPGA platform, it computes a single 256-bit scalar multiplication primitive in <inline-formula> <tex-math notation="LaTeX">$0.7~m\text{s}$ </tex-math></inline-formula>, consumes just 6.2K slices, and delivers a throughput of 1428 operations per second. The implementation results show that it is a highly efficient design outperforming the state-of-the-art by providing a better area-delay product and higher efficiency. Therefore, it has the potential to be deployed in many applications where both latency and resource requirements are critical.
ISSN:2169-3536