Algorithmic optimizations of a conjugate gradient solver on shared memory architectures

被引:1
|
作者
Lof, Henrik [1 ]
Rantakokko, Jarmo [1 ]
机构
[1] Uppsala Univ, Dept Informat Technol, Box 337, S-75105 Uppsala, Sweden
关键词
OpenMP; Shared memory programming; Iterative solvers; Conjugate gradients; Bandwidth minimization; Reversed Cuthill-McKee;
D O I
10.1080/17445760600568139
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
OpenMP is an architecture-independent language for programming in the shared memory model. OpenMP is designed to be simple and powerful in terms of programming abstractions. Unfortunately, the architecture-independent abstractions sometimes come with the price of low parallel performance. This is especially true for applications with an unstructured data access pattern running on distributed shared memory systems (DSM). Here, proper data distribution and algorithmic optimizations play a vital role for performance. In this article, we have investigated ways of improving the performance of an industrial class conjugate gradient (CG) solver, implemented in OpenMP running on two types of shared memory systems. We have evaluated bandwidth minimization, graph partitioning and reformulations of the original algorithm reducing global barriers. By a detailed analysis of barrier time and memory system performance, we found that bandwidth minimization is the most important optimization reducing both L2 misses and remote memory accesses. On a uniform memory system, we get perfect scaling. On a NUMA system, the performance is significantly improved with the algorithmic optimizations leaving the system dependent global reduction operations as a bottleneck.
引用
收藏
页码:345 / 363
页数:19
相关论文
共 50 条
  • [41] COMMUNICATION OPTIMIZATIONS FOR IRREGULAR SCIENTIFIC COMPUTATIONS ON DISTRIBUTED-MEMORY ARCHITECTURES
    DAS, R
    UYSAL, M
    SALTZ, J
    HWANG, YS
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1994, 22 (03) : 462 - 478
  • [42] Memory-Saving Technique for the Sakurai-Sugiura Eigenvalue Solver Using the Shifted Block Conjugate Gradient Method
    Futamura, Yasunori
    Sakurai, Tetsuya
    EIGENVALUE PROBLEMS: ALGORITHMS, SOFTWARE AND APPLICATIONS IN PETASCALE COMPUTING (EPASA 2015), 2017, 117 : 187 - 203
  • [43] HotSpot Thermal Floorplan Solver Using Conjugate Gradient to Speed Up
    Jiang, Zhonghua
    Xu, Ning
    MOBILE INFORMATION SYSTEMS, 2018, 2018
  • [44] Optimizing a conjugate gradient solver with non-blocking collective operations
    Hoefler, Torsten
    Gottschling, Peter
    Lumsdaine, Andrew
    Rehm, Wolfgang
    PARALLEL COMPUTING, 2007, 33 (09) : 624 - 633
  • [45] A Comparative Study of Preconditioners for GPU-Accelerated Conjugate Gradient Solver
    Chen, Yao
    Zhao, Yonghua
    Zhao, Wei
    Zhao, Lian
    2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 628 - 635
  • [46] Fault Tolerance for Conjugate Gradient Solver Based on FT-MPI
    Zhang, Weizhe
    He, Hui
    STUDIES IN INFORMATICS AND CONTROL, 2013, 22 (01): : 51 - 60
  • [47] Superlinear Speedup in a 3-D Parallel Conjugate Gradient Solver
    Camargos, A. F. P.
    Batalha, R. M. S.
    Martins, C. A. P. S.
    Silva, E. J.
    Soares, G. L.
    IEEE TRANSACTIONS ON MAGNETICS, 2009, 45 (03) : 1602 - 1605
  • [48] Optimizing a conjugate gradient solver with non-blocking collective operations
    Hoefler, Torsten
    Gottschling, Peter
    Rehm, Wolfgang
    Lumsdaine, Andrew
    RECENT ADVANCES IN PARALLEL VIRTUAL MACHINE AND MESSAGE PASSING INTERFACE, 2006, 4192 : 374 - 382
  • [49] Test Harness on a Preconditioned Conjugate Gradient Solver on GPUs: An Efficiency Analysis
    Rodrigues, A. Wendell de O.
    Chevallier, Loic
    Le Menach, Yvonnick
    Guyomarch, Frederic
    IEEE TRANSACTIONS ON MAGNETICS, 2013, 49 (05) : 1729 - 1732
  • [50] A preconditioned Krylov-subspace conjugate gradient solver for emission tomograph
    Cao-Huu, T
    Lachiver, G
    Brownell, G
    1997 IEEE NUCLEAR SCIENCE SYMPOSIUM - CONFERENCE RECORD, VOLS 1 & 2, 1998, : 1446 - 1450