AC-PM: An Area-Efficient and Configurable Polynomial Multiplier for Lattice Based Cryptography

被引:26
作者
Hu, Xiao [1 ]
Tian, Jing [1 ]
Li, Minghao [1 ]
Wang, Zhongfeng [1 ]
机构
[1] Nanjing Univ, Sch Elect Sci & Engn, Nanjing 210023, Peoples R China
基金
中国国家自然科学基金;
关键词
Hardware; Complexity theory; Field programmable gate arrays; Cryptography; Throughput; Transforms; Optimization; Lattice-based cryptography (LBC); number theoretic transform (NTT); polynomial multiplication; hardware implementation; FPGA;
D O I
10.1109/TCSI.2022.3218192
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
As the computation bottleneck in lattice-based cryptography (LBC), the polynomial multiplication based on number theoretic transform (NTT) has been continuously studied for flexible hardware implementations with high area-efficiency. This paper presents an area-efficient and configurable NTT-based polynomial multiplier (AC-PM) incorporating algorithmic and architectural level optimization techniques. For the core operation of polynomial multiplication, two low-complexity and fast modular multiplication algorithms are introduced with loose constraints of LBC-friendly primes. Based on the proposed algorithms, a reconfigurable processing element (RPE) is dedicatedly designed to execute all the operations in an NTT-based polynomial multiplication: NTT, inverse NTT (INTT), and coefficient-wise multiplication (CWM). The proposed AC-PM can be configured with different numbers of RPEs and supports various polynomial degrees without recompilation. Additionally, the dataflow complexity is greatly simplified. More importantly, to the best of our knowledge, the twiddle factors are reused, for the first time, to support both NTT and INTT with multiple polynomial degrees, which leads to increased flexibility of AC-PM with small overhead on hardware resource. FPGA implementation results demonstrate that the proposed AC-PM significantly outperforms the prior arts in both flexibility and area efficiency.
引用
收藏
页码:719 / 732
页数:14
相关论文
共 53 条
[41]   Compact ring-LWE cryptoprocessor [J].
Roy, Sujoy Sinha ;
Vercauteren, Frederik ;
Mentens, Nele ;
Donglong, Donald ;
Verbauwhede, Ingrid .
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, 8731 :371-391
[42]   Efficient Hardware Implementation of PQC Primitives and PQC algorithms Using High-Level Synthesis [J].
Soni, Deepraj ;
Karri, Ramesh .
2021 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2021), 2021, :296-301
[43]   A Highly Unified Reconfigurable Multicore Architecture to Speedup NTT/INTT for Homomorphic Polynomial Multiplication [J].
Su, Yang ;
Yang, Bai-Long ;
Yang, Chen ;
Yang, Ze-Peng ;
Liu, Yi-Wei .
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2022, 30 (08) :993-1006
[44]   ReMCA: A Reconfigurable Multi-Core Architecture for Full RNS Variant of BFV Homomorphic Evaluation [J].
Su, Yang ;
Yang, Bai-Long ;
Yang, Chen ;
Zhao, Song-Yin .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2022, 69 (07) :2857-2870
[45]   High-Speed Modular Multiplier for Lattice-Based Cryptosystems [J].
Tan, Weihang ;
Case, Benjamin M. ;
Wang, Antian ;
Gao, Shuhong ;
Lao, Yingjie .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2021, 68 (08) :2927-2931
[46]   An Ultra-Highly Parallel Polynomial Multiplier for the Bootstrapping Algorithm in a Fully Homomorphic Encryption Scheme [J].
Tan, Weihang ;
Case, Benjamin M. ;
Hu, Gengran ;
Gao, Shuhong ;
Lao, Yingjie .
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2021, 93 (06) :643-656
[47]  
Wang W., 2020, IACR Transactions on Cryptographic Hardware and Embedded Systems, TCHES, P269, DOI 10.13154/tches.v2020.i3.269- 306
[48]  
Xilinx, 7 SER FPGAS DAT SHEE
[49]  
Xilinx, 7 SER DSP48E1 SLIC
[50]  
Xing Y., 2021, IACR Trans. Cryptogr. Hardw. Embed. Syst., V2021, P328, DOI [10.46586/tches.v2021.i2.328-356, DOI 10.46586/TCHES.V2021.I2.328-356]