Algorithm-Hardware Co-Design of Split-Radix Discrete Galois Transformation for KyberKEM

被引：4

作者：

Li, Guangyan ^{[1
]}

Chen, Donglong ^{[2
]}

Mao, Gaoyu ^{[1
]}

Dai, Wangchen ^{[3
]}

Sanka, Abdurrashid Ibrahim ^{[1
]}

Cheung, Ray C. C. ^{[1
]}

机构：

[1] City Univ Hong Kong, Dept Elect Engn, Kowloon Tong, Hong Kong, Peoples R China

[2] BNU HKBU United Int Coll, Fac Sci & Technol, Zhuhai 519088, Guangdong, Peoples R China

[3] Zhejiang Lab, Hangzhou 311121, Zhejiang, Peoples R China

来源：

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING | 2023年 / 11卷 / 04期

关键词：

Discrete galois transform; split-radix; negative wrapped convolution; post-quantum cryptography; key encapsulation mechanism; hardware; FPGA; CRYSTALS-KYBER;

D O I：

10.1109/TETC.2023.3270971

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

KyberKEM is one of the final round key encapsulation mechanisms in the NIST post-quantum cryptography competition. Number theoretic transform (NTT), as the computing bottleneck of KyberKEM, has been widely studied. Discrete Galois Transformation (DGT) is a variant of NTT that reduces transform length into half but requires more multiplication operations than the latest NTT algorithm in theoretical analysis. This paper proposes the split-radix DGT, a novel DGT variant utilizing the split-radix method, to reduce the computing complexity without compromising the transform length. Specifically, for length-128 polynomial, the split-radix DGT algorithm saves at least 10% multiplication operations compared with the latest NTT algorithm in theoretical analysis. Furthermore, we proposed a unified split-radix DGT processor with the dedicated stream permutation network for KyberKEM and implemented it on the Xilinx Artix-7 FPGA. The processor achieves at least 49.4% faster transformation and 65.3% faster component-wise multiplication, with at most 87% and 32% LUT-NTT area-time product and LUT-CWM area-time product, compared with the state-of-the-art polynomial multipliers in KyberKEM with the same BFU setting on similar platforms. Lastly, we designed a highly efficient KyberKEM architecture using the proposed split-radix DGT processor. The implementation results on Artix-7 FPGA show significant performance improvements over the state-of-the-art KyberKEM designs.

引用

页码：824 / 838

页数：15

共 50 条

[1] Algorithm-hardware Co-design for Deformable Convolution
Huang, Qijing
Wang, Dequan
Gao, Yizhao
Cai, Yaohui
Dong, Zhen
Wu, Bichen
Keutzer, Kurt
Wawrzynek, John
FIFTH WORKSHOP ON ENERGY EFFICIENT MACHINE LEARNING AND COGNITIVE COMPUTING - NEURIPS EDITION (EMC2-NIPS 2019), 2019, : 48 - 51
[2] Split-Radix Algorithm for the Discrete Hirschman Transform
Xue, Dingli
DeBrunner, Linda
DeBrunner, Victor
Huang, Zhen
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 199 - 203
[3] Algorithm-hardware co-design of ultra-high radix based high throughput modular multiplier
Xiao, Hao
Liu, Yuxuan
Li, Zhenmin
Liu, Guangzhu
IEICE ELECTRONICS EXPRESS, 2021, 18 (10):
[4] Algorithm-hardware Co-design of Attention Mechanism on FPGA Devices
Zhang, Xinyi
Wu, Yawen
Zhou, Peipei
Tang, Xulong
Hu, Jingtong
ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2021, 20 (05)
[5] Toolflow for the algorithm-hardware co-design of memristive ANN accelerators
Wabnitz, Malte
Gemmeke, Tobias
Memories - Materials, Devices, Circuits and Systems, 2023, 5
[6] New split-radix algorithm for the discrete Hartley transform
Bi, G
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1997, 45 (02) : 297 - 302
[7] Algorithm-Hardware Co-design for BQSR Acceleration in Genome Analysis ToolKit
Lo, Michael
Fang, Zhenman
Wang, Jie
Zhou, Peipei
Chang, Mau-Chung Frank
Cong, Jason
28TH IEEE INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2020, : 157 - 166
[8] Optimizing Deep Learning Efficiency through Algorithm-Hardware Co-design
Santoso, Joseph T.
Wibowo, Mars C.
Raharjo, Budi
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2024, 15 (10) : 1163 - 1173
[9] Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs
Yang, Yifan
Huang, Qijing
Wu, Bichen
Zhang, Tianjun
Ma, Liang
Gambardella, Giulio
Blott, Michaela
Lavagno, Luciano
Vissers, Kees
Wawrzynek, John
Keutzer, Kurt
PROCEEDINGS OF THE 2019 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA'19), 2019, : 23 - 32
[10] CSCNN: Algorithm-hardware Co-design for CNN Accelerators using Centrosymmetric Filters
Li, Jiajun
Louri, Ahmed
Karanth, Avinash
Bunescu, Razvan
2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021), 2021, : 612 - 625

← 1 2 3 4 5 →