A Highly-efficient Lattice-based Post-Quantum Cryptography Processor for IoT Applications

被引:0
作者
Ye Z. [1 ,3 ]
Song R. [1 ]
Zhang H. [1 ]
Chen D. [2 ]
Cheung R.C.-C. [3 ]
Huang K. [1 ]
机构
[1] Zhejiang University, Hangzhou
[2] BNU-HKBU United International College, Zhuhai
来源
IACR Transactions on Cryptographic Hardware and Embedded Systems | 2024年 / 2024卷 / 02期
基金
中国国家自然科学基金;
关键词
Internet-of-Things; Lattice-Based Cryptography; Post-quantum Cryptography; RISC-V; Single-Instruction-Multiple-Data;
D O I
10.46586/tches.v2024.i2.130-153
中图分类号
学科分类号
摘要
Lattice-Based Cryptography (LBC) schemes, like CRYSTALS-Kyber and CRYSTALS-Dilithium, have been selected to be standardized in the NIST Post-Quantum Cryptography standard. However, implementing these schemes in resource-constrained Internet-of-Things (IoT) devices is challenging, considering efficiency, power consumption, area overhead, and flexibility to support various operations and parameter settings. Some existing ASIC designs that prioritize lower power and area can not achieve optimal performance efficiency, which are not practical for battery-powered devices. Custom hardware accelerators in prior co-processor and processor designs have limited applications and flexibility, incurring significant area and power overheads for IoT devices. To address these challenges, this paper presents an efficient lattice-based cryptography processor with customized Single-Instruction-Multiple-Data (SIMD) instruction. First, our proposed SIMD architecture supports efficient parallel execution of various polynomial operations in 256-bit mode and acceleration of Keccak in 320-bit mode, both utilizing efficiently reused resources. Additionally, we introduce data shuffling hardware units to resolve data dependencies within SIMD data. To further enhance performance, we design a dual-issue path for memory accesses and corresponding software design methodologies to reduce the impact of data load/store blocking. Through a hardware/software co-design approach, our proposed processor achieves high efficiency in supporting all operations in lattice-based cryptography schemes. Evaluations of Kyber and Dilithium show our proposed processor achieves over 10× speedup compared with the baseline RISC-V processor and over 5× speedup versus ARM Cortex M4 implementations, making it a promising solution for securing IoT communications and storage. Moreover, Silicon synthesis results show our design can run at 200 MHz with 2.01 mW for Kyber KEM 512 and 2.13 mW for Dilithium 2, which outperforms state-of-the-art works in terms of PPAP (Performance × Power × Area). © 2024, Ruhr-University of Bochum. All rights reserved.
引用
收藏
页码:130 / 153
页数:23
相关论文
共 33 条
  • [1] Aikata Aikata, Mert Ahmet Can, Imran Malik, Pagliarini Samuel, Roy Sujoy Sinha, Kali: A crystal for post-quantum security using kyber and dilithium, IEEE Transactions on Circuits and Systems I: Regular Papers, 70, 2, pp. 747-758, (2022)
  • [2] Barrett Paul, Implementing the rivest shamir and adleman public key encryption algorithm on a standard digital signal processor, Conference on the Theory and Application of Cryptographic Techniques, pp. 311-323, (1986)
  • [3] Bos Joppe, Ducas Leo, Kiltz Eike, Lepoint Tancrede, Lyubashevsky Vadim, Schanck John M, Schwabe Peter, Seiler Gregor, Stehle Damien, Crystals-kyber: a cca-secure module-lattice-based kem, 2018 IEEE European Symposium on Security and Privacy (EuroS&P), pp. 353-367, (2018)
  • [4] Bertoni Guido, Daemen Joan, Peeters Michael, Van Assche Gilles, Keccak, Advances in Cryptology–EUROCRYPT 2013: 32nd Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp. 313-314, (2013)
  • [5] Bisheh-Niasar Mojtaba, Azarderakhsh Reza, Mozaffari-Kermani Mehran, A monolithic hardware implementation of kyber: Comparing apples to apples in pqc candidates, Progress in Cryptology–LATINCRYPT 2021: 7th International Conference on Cryptology and Information Security in Latin America, Bogotá, Colombia, October 6–8, 2021, Proceedings 7, pp. 108-126
  • [6] Bellare Mihir, Rogaway Phillip, The exact security of digital signatures-how to sign with rsa and rabin, International conference on the theory and applications of cryptographic techniques, pp. 399-416, (1996)
  • [7] Banerjee Utsav, Ukyab Tenzin S, Chandrakasan Anantha P, Sapphire: A configurable crypto-processor for post-quantum lattice-based protocols, (2019)
  • [8] Chen Donald Donglong, Mentens Nele, Vercauteren Frederik, Roy Sujoy Sinha, Cheung R. C. C., Pao Derek, Verbauwhede Ingrid, High-speed polynomial multiplication architecture for ring-LWE and SHE cryptosystems, IEEE Transactions on Circuits and Systems I: Regular Papers, 62, 1, pp. 157-166, (2014)
  • [9] Chang Shu-jen, Perlner Ray, Burr William E, Turan Meltem Sonmez, Kelsey John M, Paul Souradyuti, Bassham Lawrence E, Third-round report of the sha-3 cryptographic hash algorithm competition, NIST Interagency Report, 7896, (2012)
  • [10] Cooley James W, Tukey John W, An algorithm for the machine calculation of complex fourier series, Mathematics of computation, 19, 90, pp. 297-301, (1965)