Algorithmic optimizations of a conjugate gradient solver on shared memory architectures

被引:1
|
作者
Lof, Henrik [1 ]
Rantakokko, Jarmo [1 ]
机构
[1] Uppsala Univ, Dept Informat Technol, Box 337, S-75105 Uppsala, Sweden
关键词
OpenMP; Shared memory programming; Iterative solvers; Conjugate gradients; Bandwidth minimization; Reversed Cuthill-McKee;
D O I
10.1080/17445760600568139
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
OpenMP is an architecture-independent language for programming in the shared memory model. OpenMP is designed to be simple and powerful in terms of programming abstractions. Unfortunately, the architecture-independent abstractions sometimes come with the price of low parallel performance. This is especially true for applications with an unstructured data access pattern running on distributed shared memory systems (DSM). Here, proper data distribution and algorithmic optimizations play a vital role for performance. In this article, we have investigated ways of improving the performance of an industrial class conjugate gradient (CG) solver, implemented in OpenMP running on two types of shared memory systems. We have evaluated bandwidth minimization, graph partitioning and reformulations of the original algorithm reducing global barriers. By a detailed analysis of barrier time and memory system performance, we found that bandwidth minimization is the most important optimization reducing both L2 misses and remote memory accesses. On a uniform memory system, we get perfect scaling. On a NUMA system, the performance is significantly improved with the algorithmic optimizations leaving the system dependent global reduction operations as a bottleneck.
引用
收藏
页码:345 / 363
页数:19
相关论文
共 50 条
  • [31] Matrix Decomposition Based Conjugate Gradient Solver for Poisson Equation
    Liu, Hang
    Seo, Jung-Hee
    Mittal, Rajat
    Huang, H. Howie
    2012 SC COMPANION: HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SCC), 2012, : 1501 - 1501
  • [32] Accelerating Preconditioned Conjugate Gradient solver in Wind Field Calculation
    Sanjuan, Gemma
    Margalef, Tomas
    Cortes, Ana
    2016 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS 2016), 2016, : 294 - 301
  • [33] THE LIMITED MEMORY CONJUGATE GRADIENT METHOD
    Hager, William W.
    Zhang, Hongchao
    SIAM JOURNAL ON OPTIMIZATION, 2013, 23 (04) : 2150 - 2168
  • [34] Memory retention in conjugate gradient methods
    Dixon, L
    LINEAR AND NONLINEAR CONJUGATE GRADIENT-RELATED METHODS, 1996, : 101 - 106
  • [35] A study of shared-memory parallelism in a multifrontal solver
    L'Excellent, Jean-Yves
    Sid-Lakhdar, Wissam M.
    PARALLEL COMPUTING, 2014, 40 (3-4) : 34 - 46
  • [36] ATM SHARED-MEMORY SWITCHING ARCHITECTURES
    GARCIAHARO, J
    JAJSZCZYK, A
    IEEE NETWORK, 1994, 8 (04): : 18 - 26
  • [37] Ray casting on shared-memory architectures
    Inktomi Corp, San Mateo, United States
    IEEE Concurrency, 1 (20-35):
  • [38] Analytic evaluation of shared-memory architectures
    Sorin, DJ
    Lemon, JL
    Eater, DL
    Vernon, MK
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2003, 14 (02) : 166 - 180
  • [39] Performance of scalable shared-memory architectures
    Motlagh, BS
    DeMara, RF
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2000, 10 (1-2) : 1 - 22
  • [40] Binding time in distributed shared memory architectures
    Kong, J
    Lee, G
    1998 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING - PROCEEDINGS, 1998, : 198 - 206