共 32 条
Algorithm-Hardware Co-Design of Split-Radix Discrete Galois Transformation for KyberKEM
被引:4
作者:
Li, Guangyan
[1
]
Chen, Donglong
[2
]
Mao, Gaoyu
[1
]
Dai, Wangchen
[3
]
Sanka, Abdurrashid Ibrahim
[1
]
Cheung, Ray C. C.
[1
]
机构:
[1] City Univ Hong Kong, Dept Elect Engn, Kowloon Tong, Hong Kong, Peoples R China
[2] BNU HKBU United Int Coll, Fac Sci & Technol, Zhuhai 519088, Guangdong, Peoples R China
[3] Zhejiang Lab, Hangzhou 311121, Zhejiang, Peoples R China
关键词:
Discrete galois transform;
split-radix;
negative wrapped convolution;
post-quantum cryptography;
key encapsulation mechanism;
hardware;
FPGA;
CRYSTALS-KYBER;
D O I:
10.1109/TETC.2023.3270971
中图分类号:
TP [自动化技术、计算机技术];
学科分类号:
0812 ;
摘要:
KyberKEM is one of the final round key encapsulation mechanisms in the NIST post-quantum cryptography competition. Number theoretic transform (NTT), as the computing bottleneck of KyberKEM, has been widely studied. Discrete Galois Transformation (DGT) is a variant of NTT that reduces transform length into half but requires more multiplication operations than the latest NTT algorithm in theoretical analysis. This paper proposes the split-radix DGT, a novel DGT variant utilizing the split-radix method, to reduce the computing complexity without compromising the transform length. Specifically, for length-128 polynomial, the split-radix DGT algorithm saves at least 10% multiplication operations compared with the latest NTT algorithm in theoretical analysis. Furthermore, we proposed a unified split-radix DGT processor with the dedicated stream permutation network for KyberKEM and implemented it on the Xilinx Artix-7 FPGA. The processor achieves at least 49.4% faster transformation and 65.3% faster component-wise multiplication, with at most 87% and 32% LUT-NTT area-time product and LUT-CWM area-time product, compared with the state-of-the-art polynomial multipliers in KyberKEM with the same BFU setting on similar platforms. Lastly, we designed a highly efficient KyberKEM architecture using the proposed split-radix DGT processor. The implementation results on Artix-7 FPGA show significant performance improvements over the state-of-the-art KyberKEM designs.
引用
收藏
页码:824 / 838
页数:15
相关论文
共 32 条