Algorithm-Hardware Co-Design of Split-Radix Discrete Galois Transformation for KyberKEM

被引:4
|
作者
Li, Guangyan [1 ]
Chen, Donglong [2 ]
Mao, Gaoyu [1 ]
Dai, Wangchen [3 ]
Sanka, Abdurrashid Ibrahim [1 ]
Cheung, Ray C. C. [1 ]
机构
[1] City Univ Hong Kong, Dept Elect Engn, Kowloon Tong, Hong Kong, Peoples R China
[2] BNU HKBU United Int Coll, Fac Sci & Technol, Zhuhai 519088, Guangdong, Peoples R China
[3] Zhejiang Lab, Hangzhou 311121, Zhejiang, Peoples R China
关键词
Discrete galois transform; split-radix; negative wrapped convolution; post-quantum cryptography; key encapsulation mechanism; hardware; FPGA; CRYSTALS-KYBER;
D O I
10.1109/TETC.2023.3270971
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
KyberKEM is one of the final round key encapsulation mechanisms in the NIST post-quantum cryptography competition. Number theoretic transform (NTT), as the computing bottleneck of KyberKEM, has been widely studied. Discrete Galois Transformation (DGT) is a variant of NTT that reduces transform length into half but requires more multiplication operations than the latest NTT algorithm in theoretical analysis. This paper proposes the split-radix DGT, a novel DGT variant utilizing the split-radix method, to reduce the computing complexity without compromising the transform length. Specifically, for length-128 polynomial, the split-radix DGT algorithm saves at least 10% multiplication operations compared with the latest NTT algorithm in theoretical analysis. Furthermore, we proposed a unified split-radix DGT processor with the dedicated stream permutation network for KyberKEM and implemented it on the Xilinx Artix-7 FPGA. The processor achieves at least 49.4% faster transformation and 65.3% faster component-wise multiplication, with at most 87% and 32% LUT-NTT area-time product and LUT-CWM area-time product, compared with the state-of-the-art polynomial multipliers in KyberKEM with the same BFU setting on similar platforms. Lastly, we designed a highly efficient KyberKEM architecture using the proposed split-radix DGT processor. The implementation results on Artix-7 FPGA show significant performance improvements over the state-of-the-art KyberKEM designs.
引用
收藏
页码:824 / 838
页数:15
相关论文
共 50 条
  • [1] Algorithm-hardware Co-design for Deformable Convolution
    Huang, Qijing
    Wang, Dequan
    Gao, Yizhao
    Cai, Yaohui
    Dong, Zhen
    Wu, Bichen
    Keutzer, Kurt
    Wawrzynek, John
    FIFTH WORKSHOP ON ENERGY EFFICIENT MACHINE LEARNING AND COGNITIVE COMPUTING - NEURIPS EDITION (EMC2-NIPS 2019), 2019, : 48 - 51
  • [2] Split-Radix Algorithm for the Discrete Hirschman Transform
    Xue, Dingli
    DeBrunner, Linda
    DeBrunner, Victor
    Huang, Zhen
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 199 - 203
  • [3] Algorithm-hardware co-design of ultra-high radix based high throughput modular multiplier
    Xiao, Hao
    Liu, Yuxuan
    Li, Zhenmin
    Liu, Guangzhu
    IEICE ELECTRONICS EXPRESS, 2021, 18 (10):
  • [4] Algorithm-hardware Co-design of Attention Mechanism on FPGA Devices
    Zhang, Xinyi
    Wu, Yawen
    Zhou, Peipei
    Tang, Xulong
    Hu, Jingtong
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2021, 20 (05)
  • [5] Toolflow for the algorithm-hardware co-design of memristive ANN accelerators
    Wabnitz, Malte
    Gemmeke, Tobias
    Memories - Materials, Devices, Circuits and Systems, 2023, 5
  • [6] New split-radix algorithm for the discrete Hartley transform
    Bi, G
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1997, 45 (02) : 297 - 302
  • [7] Algorithm-Hardware Co-design for BQSR Acceleration in Genome Analysis ToolKit
    Lo, Michael
    Fang, Zhenman
    Wang, Jie
    Zhou, Peipei
    Chang, Mau-Chung Frank
    Cong, Jason
    28TH IEEE INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2020, : 157 - 166
  • [8] Optimizing Deep Learning Efficiency through Algorithm-Hardware Co-design
    Santoso, Joseph T.
    Wibowo, Mars C.
    Raharjo, Budi
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2024, 15 (10) : 1163 - 1173
  • [9] Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs
    Yang, Yifan
    Huang, Qijing
    Wu, Bichen
    Zhang, Tianjun
    Ma, Liang
    Gambardella, Giulio
    Blott, Michaela
    Lavagno, Luciano
    Vissers, Kees
    Wawrzynek, John
    Keutzer, Kurt
    PROCEEDINGS OF THE 2019 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA'19), 2019, : 23 - 32
  • [10] CSCNN: Algorithm-hardware Co-design for CNN Accelerators using Centrosymmetric Filters
    Li, Jiajun
    Louri, Ahmed
    Karanth, Avinash
    Bunescu, Razvan
    2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021), 2021, : 612 - 625