Optimizing Lattice Basis Reduction Algorithm on ARM V8 Processors

被引:0
作者
Cao, Ronghui [1 ]
Wang, Julong [1 ]
Zheng, Liming [2 ]
Zhou, Jincheng [1 ]
Wang, Haodong [1 ]
Xiao, Tiaojie [3 ]
Gong, Chunye [3 ,4 ]
机构
[1] Changsha Univ Sci & Technol, Sch Comp & Commun Engn, Changsha 410073, Peoples R China
[2] Natl Univ Def Technol, Coll Elect Sci, Changsha 410073, Peoples R China
[3] Natl Univ Def Technol, Coll Comp, Changsha 410073, Peoples R China
[4] Natl Supercomp Ctr Tianjin, Tianjin 300457, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 04期
关键词
lattice reduction; parallel optimization; Tianhe; ARM V8;
D O I
10.3390/app15042021
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The LLL (Lenstra-Lenstra-Lov & aacute;sz) algorithm is an important method for lattice basis reduction and has broad applications in computer algebra, cryptography, number theory, and combinatorial optimization. However, current LLL algorithms face challenges such as inadequate adaptation to domestic supercomputers and low efficiency. To enhance the efficiency of the LLL algorithm in practical applications, this research focuses on parallel optimization of the LLL_FP (LLL double-precision floating-point type) algorithm from the NTL library on the domestic Tianhe supercomputer using the Phytium ARM V8 processor. The optimization begins with the vectorization of the Gram-Schmidt coefficient calculation and row transformation using the SIMD instruction set of the Phytium chip, which significantly improve computational efficiency. Further assembly-level optimization fully utilizes the low-level instructions of the Phytium processor, and this increases execution speed. In terms of memory access, data prefetch techniques were then employed to load necessary data in advance before computation. This will reduce cache misses and accelerate data processing. To further enhance performance, loop unrolling was applied to the core loop, which allows more operations per loop iteration. Experimental results show that the optimized LLL_FP algorithm achieves up to a 42% performance improvement, with a minimum improvement of 34% and an average improvement of 38% in single-core efficiency compared to the serial LLL_FP algorithm. This study provides a more efficient solution for large-scale lattice basis reduction and demonstrates the potential of the LLL algorithm in ARM V8 high-performance computing environments.
引用
收藏
页数:16
相关论文
共 25 条
[1]  
Ajtai M., 1996, Proceedings of the Twenty-Eighth Annual ACM Symposium on the Theory of Computing, P99, DOI 10.1145/237814.237838
[2]   The General Sieve Kernel and New Records in Lattice Reduction [J].
Albrecht, Martin R. ;
Ducas, Leo ;
Herold, Gottfried ;
Kirshanova, Elena ;
Postlethwaite, Eamonn W. ;
Stevens, Marc .
ADVANCES IN CRYPTOLOGY - EUROCRYPT 2019, PT II, 2019, 11477 :717-746
[3]   Improved Progressive BKZ Algorithms and Their Precise Cost Estimation by Sharp Simulator [J].
Aono, Yoshinori ;
Wang, Yuntao ;
Hayashi, Takuya ;
Takagi, Tsuyoshi .
ADVANCES IN CRYPTOLOGY - EUROCRYPT 2016, PT I, 2016, 9665 :789-819
[4]  
Cheng Y, 2019, PR MACH LEARN RES, V99
[5]  
Chockalingam A, 2014, LARGE MIMO SYSTEMS, P1
[6]   Small solutions to polynomial equations, and low exponent RSA vulnerabilities [J].
Coppersmith, D .
JOURNAL OF CRYPTOLOGY, 1997, 10 (04) :233-260
[7]   Geometric Rescaling Algorithms for Submodular Function Minimization [J].
Dadush, Dan ;
Vegh, Laszlo A. ;
Zambelli, Giacomo .
MATHEMATICS OF OPERATIONS RESEARCH, 2021, 46 (03) :1081-1108
[8]   A quantum algorithm for computing the unit group of an arbitrary degree number field [J].
Eisentrager, Kirsten ;
Hallgren, Sean ;
Kitaev, Alexei ;
Song, Fang .
STOC'14: PROCEEDINGS OF THE 46TH ANNUAL 2014 ACM SYMPOSIUM ON THEORY OF COMPUTING, 2014, :293-302
[9]  
Gao W., 2016, J. Inf. Eng. Univ, V17, P496
[10]   An efficient image to column algorithm for convolutional neural networks [J].
Gong, Chunye ;
Chen, Xinhai ;
Lv, Shuling ;
Liu, Jie ;
Yang, Bo ;
Wang, QingLin ;
Bao, Weimin ;
Pang, Yufei ;
Sun, Yang .
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,