Model guided algorithm optimization for tridiagonal solver on many-core architectures

被引:1
作者
Liu, Kan [1 ]
Wang, Xinliang [2 ]
Xue, Wei [1 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing, Peoples R China
[2] Huawei Technol Co Ltd, Shenzhen, Peoples R China
基金
国家重点研发计划;
关键词
Tridiagonal solver; Many-core architectures; Parallel numerical algorithm; SYSTEM;
D O I
10.1007/s42514-022-00124-w
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Tridiagonal solver is an important kernel used in a wide range of applications and has been well supported in mainstream numerical libraries. Quite a few parallel algorithms have been developed, but the best-performing algorithm may vary across architectures as well as input sizes. Targeting this algorithm choice challenge, we present a model guided approach to determine the best batched tridiagonal algorithm for various many-core architectures and input sizes efficiently and effectively, managing to achieve the accuracy of algorithm choice over 92% for important architectures. Following the approach, we propose a hybrid CR-PCR-pThomas algorithm to well leverage computation and memory access. The hybrid algorithm outperforms the current state-of-the-art alternatives by up to 32% and 21% on Pascal P100 and Knights Landing respectively. On SW26010 that powers the No.6 supercomputer Sunway TaihuLight, we present an improved cyclic reduction algorithm Dist-CR. The proposed Dist-CR outperforms Thomas algorithm by speedups up to 2.14x.
引用
收藏
页码:43 / 55
页数:13
相关论文
共 31 条
[1]  
Chang LW, 2011, INT CONF ACOUST SPEE, P1621
[2]  
Chang Li-Wen., 2012, INT CONF HIGH PERFOR, P1
[3]  
Davidson A., 2011, Proceedings of the 25th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2011), P956, DOI 10.1109/IPDPS.2011.92
[4]  
Davidson Andrew., 2011, Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, p4:1, DOI [DOI 10.1145/1964179.1964185, 10.1145/1964179. 1964185.]
[5]   New Tridiagonal Systems Solvers on GPU architectures [J].
Dieguez, Adrian P. ;
Amor, Margarita ;
Doallo, Ramon .
2015 IEEE 22ND INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2015, :85-93
[6]   A RECURSIVE DOUBLING-ALGORITHM FOR SOLUTION OF TRIDIAGONAL SYSTEMS ON HYPERCUBE MULTIPROCESSORS [J].
EGECIOGLU, O ;
KOC, CK ;
LAUB, AJ .
JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 1989, 27 (1-2) :95-108
[7]   Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed-Precision Multigrid [J].
Goeddeke, Dominik ;
Strzodka, Robert .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2011, 22 (01) :22-32
[8]  
Greenbaum A., 1997, Society for Industrial and Applied Mathematics
[9]   Evaluation of vertical coordinate and vertical mixing algorithms in the HYbrid-Coordinate Ocean Model (HYCOM) [J].
Halliwell, GR .
OCEAN MODELLING, 2004, 7 (3-4) :285-322
[10]  
Hee-Seok Kim, 2011, 2011 International Conference on Parallel Processing, P444, DOI 10.1109/ICPP.2011.41