Algorithm-Hardware Co-Design of Split-Radix Discrete Galois Transformation for KyberKEM

被引:4
作者
Li, Guangyan [1 ]
Chen, Donglong [2 ]
Mao, Gaoyu [1 ]
Dai, Wangchen [3 ]
Sanka, Abdurrashid Ibrahim [1 ]
Cheung, Ray C. C. [1 ]
机构
[1] City Univ Hong Kong, Dept Elect Engn, Kowloon Tong, Hong Kong, Peoples R China
[2] BNU HKBU United Int Coll, Fac Sci & Technol, Zhuhai 519088, Guangdong, Peoples R China
[3] Zhejiang Lab, Hangzhou 311121, Zhejiang, Peoples R China
关键词
Discrete galois transform; split-radix; negative wrapped convolution; post-quantum cryptography; key encapsulation mechanism; hardware; FPGA; CRYSTALS-KYBER;
D O I
10.1109/TETC.2023.3270971
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
KyberKEM is one of the final round key encapsulation mechanisms in the NIST post-quantum cryptography competition. Number theoretic transform (NTT), as the computing bottleneck of KyberKEM, has been widely studied. Discrete Galois Transformation (DGT) is a variant of NTT that reduces transform length into half but requires more multiplication operations than the latest NTT algorithm in theoretical analysis. This paper proposes the split-radix DGT, a novel DGT variant utilizing the split-radix method, to reduce the computing complexity without compromising the transform length. Specifically, for length-128 polynomial, the split-radix DGT algorithm saves at least 10% multiplication operations compared with the latest NTT algorithm in theoretical analysis. Furthermore, we proposed a unified split-radix DGT processor with the dedicated stream permutation network for KyberKEM and implemented it on the Xilinx Artix-7 FPGA. The processor achieves at least 49.4% faster transformation and 65.3% faster component-wise multiplication, with at most 87% and 32% LUT-NTT area-time product and LUT-CWM area-time product, compared with the state-of-the-art polynomial multipliers in KyberKEM with the same BFU setting on similar platforms. Lastly, we designed a highly efficient KyberKEM architecture using the proposed split-radix DGT processor. The implementation results on Artix-7 FPGA show significant performance improvements over the state-of-the-art KyberKEM designs.
引用
收藏
页码:824 / 838
页数:15
相关论文
共 32 条
  • [31] Algorithm and Architecture Co-Design of Hardware-Oriented, Modified Diamond Search for Fast Motion Estimation in H.264/AVC
    Ndili, Obianuju
    Ogunfunmi, Tokunbo
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2011, 21 (09) : 1214 - 1227
  • [32] HDSuper: High-Quality and High Computational Utilization Edge Super-Resolution Accelerator With Hardware-Algorithm Co-Design Techniques
    Zhao, Xin
    Chang, Liang
    Fan, Dongqi
    Hu, Zhicheng
    Yue, Ting
    Tu, Fengbin
    Zhou, Jun
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2024, 71 (04) : 1679 - 1692