Algorithmic optimizations of a conjugate gradient solver on shared memory architectures

被引：1

作者：

Lof, Henrik ^{[1
]}

Rantakokko, Jarmo ^{[1
]}

机构：

[1] Uppsala Univ, Dept Informat Technol, Box 337, S-75105 Uppsala, Sweden

来源：

INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND DISTRIBUTED SYSTEMS | 2006年 / 21卷 / 05期

关键词：

OpenMP; Shared memory programming; Iterative solvers; Conjugate gradients; Bandwidth minimization; Reversed Cuthill-McKee;

D O I：

10.1080/17445760600568139

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

OpenMP is an architecture-independent language for programming in the shared memory model. OpenMP is designed to be simple and powerful in terms of programming abstractions. Unfortunately, the architecture-independent abstractions sometimes come with the price of low parallel performance. This is especially true for applications with an unstructured data access pattern running on distributed shared memory systems (DSM). Here, proper data distribution and algorithmic optimizations play a vital role for performance. In this article, we have investigated ways of improving the performance of an industrial class conjugate gradient (CG) solver, implemented in OpenMP running on two types of shared memory systems. We have evaluated bandwidth minimization, graph partitioning and reformulations of the original algorithm reducing global barriers. By a detailed analysis of barrier time and memory system performance, we found that bandwidth minimization is the most important optimization reducing both L2 misses and remote memory accesses. On a uniform memory system, we get perfect scaling. On a NUMA system, the performance is significantly improved with the algorithmic optimizations leaving the system dependent global reduction operations as a bottleneck.

引用

页码：345 / 363

页数：19

共 50 条

[31] Matrix Decomposition Based Conjugate Gradient Solver for Poisson Equation
Liu, Hang
Seo, Jung-Hee
Mittal, Rajat
Huang, H. Howie
2012 SC COMPANION: HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SCC), 2012, : 1501 - 1501
[32] Accelerating Preconditioned Conjugate Gradient solver in Wind Field Calculation
Sanjuan, Gemma
Margalef, Tomas
Cortes, Ana
2016 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS 2016), 2016, : 294 - 301
[33] THE LIMITED MEMORY CONJUGATE GRADIENT METHOD
Hager, William W.
Zhang, Hongchao
SIAM JOURNAL ON OPTIMIZATION, 2013, 23 (04) : 2150 - 2168
[34] Memory retention in conjugate gradient methods
Dixon, L
LINEAR AND NONLINEAR CONJUGATE GRADIENT-RELATED METHODS, 1996, : 101 - 106
[35] A study of shared-memory parallelism in a multifrontal solver
L'Excellent, Jean-Yves
Sid-Lakhdar, Wissam M.
PARALLEL COMPUTING, 2014, 40 (3-4) : 34 - 46
[36] ATM SHARED-MEMORY SWITCHING ARCHITECTURES
GARCIAHARO, J
JAJSZCZYK, A
IEEE NETWORK, 1994, 8 (04): : 18 - 26
[37] Ray casting on shared-memory architectures
Inktomi Corp, San Mateo, United States
IEEE Concurrency, 1 (20-35):
[38] Analytic evaluation of shared-memory architectures
Sorin, DJ
Lemon, JL
Eater, DL
Vernon, MK
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2003, 14 (02) : 166 - 180
[39] Performance of scalable shared-memory architectures
Motlagh, BS
DeMara, RF
JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2000, 10 (1-2) : 1 - 22
[40] Binding time in distributed shared memory architectures
Kong, J
Lee, G
1998 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING - PROCEEDINGS, 1998, : 198 - 206

← 1 2 3 4 5 →