Solving systems of linear equations on the CELL processor using Cholesky factorization

被引:48
作者
Kurzak, Jakub [1 ]
Buttari, Alfredo [1 ]
Dongarra, Jack [1 ]
机构
[1] Univ Tennessee, Dept Elect Engn & Comp Sci, Knoxville, TN 37996 USA
基金
美国国家科学基金会;
关键词
parallel algorithms; numerical linear algebra; CELL broadband engine;
D O I
10.1109/TPDS.2007.70813
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The Sony/Toshiba/IBM (STI) CELL processor introduces pioneering solutions in processor architecture. At the same time, it presents new challenges for the development of numerical algorithms. One is the effective exploitation of the differential between the speed of single- and double-precision arithmetic; the other is the efficient parallelization between the short-vector Single-Instruction, Multiple-Data (SIMD) cores. The first challenge is addressed by utilizing the well-known technique of iterative refinement for the solution of a dense symmetric positive definite system of linear equations, resulting in a mixed-precision algorithm, which delivers double-precision accuracy while performing the bulk of the work in single precision. The main contribution of this paper lies in addressing the second challenge by successful thread-level parallelization, exploiting fine-grained task granularity and a lightweight decentralized synchronization. The implementation of the computationally intensive sections gets within 90 percent of the peak floating-point performance, while the implementation of the memory-intensive sections reaches within 90 percent of the peak memory bandwidth. On a single CELL processor, the algorithm achieves more than 170 Gflops when solving a symmetric positive definite system of linear equations in single precision and more than 150 Gflops when delivering the result in double-precision accuracy.
引用
收藏
页码:1175 / 1185
页数:11
相关论文
共 17 条
[1]  
AGARWAL RC, 1988, P IFIP WG 2 5 WORK C, P217
[2]  
AGARWAL RC, 1989, P ACM IEEE C SUP
[3]   A fully portable high performance minimal storage hybrid format Cholesky algorithm [J].
Andersen, BS ;
Gunnels, JA ;
Gustavson, FG ;
Reid, JK ;
Wasniewski, J .
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2005, 31 (02) :201-227
[4]  
[Anonymous], J ACM, DOI [10.1145/321386.321394, DOI 10.1145/321386.321394]
[5]  
[Anonymous], UTCS07595
[6]   Extrathecal intraradicular nerve sheath tumor [J].
Celli, P ;
Trillò, G ;
Ferrante, L .
JOURNAL OF NEUROSURGERY-SPINE, 2005, 3 (01) :1-11
[7]  
CHEN T, 2005, CELL BROADBAND ENGIN
[8]  
DONGARRA JJ, 1998, SOFTW ENVIRONM TOOL, P1
[9]  
Higham N. J., 1996, ACCURACY STABILITY N
[10]  
HOFSTEE HP, 2005, P 11 INT S HIGH PERF