Tiled QR Decomposition and Its Optimization on CPU and GPU Computing System

被引：4

作者：

Kim, Dongjin ^{[1
]}

Park, Kyu-Ho ^{[1
]}

机构：

[1] Korea Adv Inst Sci & Technol, Comp Engn Res Lab, Taejon 305701, South Korea

来源：

2013 42ND ANNUAL INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP) | 2013年

关键词：

D O I：

10.1109/ICPP.2013.88

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

There can be many types of heterogeneous computing systems, and the most useful one is the CPU and GPU computing system. In this system, we try to run QR decomposition, which expresses a standard real matrix as a production of two matrices. For a tiled QR decomposition algorithm, which is a parallelized version of QR decomposition, because of the heterogeneity of computing devices and communication cost, the way that each tile is distributed into which device is the main issue of tiled QR decomposition. The goal of this study is to optimize the tile distribution and the tiled QR decomposition operation mathematically, depending on the given system. We select the main computing device for the main steps of the algorithm, optimize the number of devices, and optimize the tile distribution among the devices using a distribution guide array. Our evaluation confirms that our method has good scalability and the optimization process maximizes the tiled QR decomposition performance.

引用

页码：744 / 753

页数：10

共 11 条

[1]

Agullo E., 2011, Proceedings of the 25th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2011), P932, DOI 10.1109/IPDPS.2011.90

[2]

Anderson M., 2011, Proceedings of the 25th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2011), P48, DOI 10.1109/IPDPS.2011.15

[3]

[Anonymous], 2010, SC 10 P 2010 ACM IEE, DOI DOI 10.1109/SC.2010.48

[4]

Bouwmeester H., 2011, High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for, P1

[5] A class of parallel tiled linear algebra algorithms for multicore architectures [J].

Buttari, Alfredo ;

Langou, Julien ;

Kurzak, Jakub ;

Dongarra, Jack .

PARALLEL COMPUTING, 2009, 35 (01) :38-53

[6]

Fogue M., 2010, 2010 International Conference on High Performance Computing & Simulation (HPCS 2010), P444, DOI 10.1109/HPCS.2010.5547094

[7] UNITARY TRIANGULARIZATION OF A NONSYMMETRIC MATRIX [J].

HOUSEHOLDER, AS .

JOURNAL OF THE ACM, 1958, 5 (04) :339-342

[8]

Intel, 2014, INT XEON PHI COPR

[9]

NVIDIA, CUD OFF CIT

[10] Solving Dense Linear Systems on Platforms with Multiple Hardware Accelerators [J].

Quintana-Orti, Gregorio ;

Igual, Francisco D. ;

Quintana-Orti, Enrique S. ;

van de Geijn, Robert .

ACM SIGPLAN NOTICES, 2009, 44 (04) :121-129

← 1 2 →