Performance Analysis of Preconditioned Conjugate Gradient Solver on Heterogeneous (Multi-CPUs/Multi-GPUs) Architecture

被引：0

作者：

Kasmi, Najlae ^{[1
]}

Zbakh, Mostapha ^{[1
]}

Haouari, Amine ^{[1
]}

机构：

[1] Mohammed V Univ, ENSIAS, Rabat, Morocco

来源：

CLOUD COMPUTING AND BIG DATA: TECHNOLOGIES, APPLICATIONS AND SECURITY | 2019年 / 49卷

关键词：

D O I：

10.1007/978-3-319-97719-5_20

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The solution of systems of linear equations is one of the most central processing unit-intensive steps in engineering and simulation applications and can greatly benefit from the multitude of processing cores and vectorisation on today's parallel computers. Our objective is to evaluate the performance of one of them, the conjugate gradient method, on a hybrid computing platform (Multi-GPU/Multi-CPU). We consider the preconditioned conjugate gradient solver (PCG) since it exhibits the main features of such problems. Indeed, the relative performance of CPU and GPU highly depends on the sub-routine: GPUs are for instance much more efficient to process regular kernels such as matrix vector multiplications rather than more irregular kernels such as matrix factorization. In this context, one solution consists in relying on dynamic scheduling and resource allocation mechanisms such as the ones provided by StarPU. In this chapter we evaluate the performance of dynamic schedulers proposed by StarPU, and we analyse the scalability of PCG algorithm. We show how effectively we can choose the best combination of resources in order to improve their performance.

引用

页码：318 / 336

页数：19

共 46 条

[21] Algorithmic scheme for hybrid computing with CPU, Xeon-Phi/MIC and GPU devices on a single machine [J].

Contassot-Vivier, Sylvain ;

Vialle, Stephane .

PARALLEL COMPUTING: ON THE ROAD TO EXASCALE, 2016, 27 :25-34

[22]

Dhillon C.D., 1995, LAPACK WORKING NOTE

[23] From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming [J].

Du, Peng ;

Weber, Rick ;

Luszczek, Piotr ;

Tomov, Stanimire ;

Peterson, Gregory ;

Dongarra, Jack .

PARALLEL COMPUTING, 2012, 38 (08) :391-407

[24]

Gaster B., 2012, Heterogeneous computing with OpenCL

[25]

Gautier T., 2007, PASCO 07 PROC 2007 I, P15

[26] Co-processor acceleration of an unmodified parallel solid mechanics code with FEASTGPU [J].

Goeddeke, Dominik ;

Wobker, Hilmar ;

Strzodka, Robert ;

Mohd-Yusof, Jamaludin ;

McCormick, Patrick ;

Turek, Stefan .

INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2009, 4 (04) :254-269

[27]

Gupta A., 2013, 2013 IEEE 5 INT C CL, V1

[28] Deep Neural Networks for Acoustic Modeling in Speech Recognition [J].

Hinton, Geoffrey ;

Deng, Li ;

Yu, Dong ;

Dahl, George E. ;

Mohamed, Abdel-rahman ;

Jaitly, Navdeep ;

Senior, Andrew ;

Vanhoucke, Vincent ;

Patrick Nguyen ;

Sainath, Tara N. ;

Kingsbury, Brian .

IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) :82-97

[29] Computational modelling of visual attention [J].

Itti, L ;

Koch, C .

NATURE REVIEWS NEUROSCIENCE, 2001, 2 (03) :194-203

[30]

JIMENEZ VJ, 2009, HIGH PERFORMANCE EMB, V5409, P19

← 1 2 3 4 5 →