An Implementation of Block Conjugate Gradient Algorithm on CPU-GPU Processors

被引:4
|
作者
Ji, Hao [1 ]
Sosonkina, Masha [2 ]
Li, Yaohang [1 ]
机构
[1] Old Dominion Univ, Dept Comp Sci, Norfolk, VA 23529 USA
[2] Old Dominion Univ, Dept Modeling Simulat & Visualizat Engn, Norfolk, VA 23529 USA
来源
2014 HARDWARE-SOFTWARE CO-DESIGN FOR HIGH PERFORMANCE COMPUTING (CO-HPC) | 2014年
关键词
Block Conjugate Gradient; Multi-core CPU; Graphics Processing Unit; Intel Xeon Phi; Performance Evaluation;
D O I
10.1109/Co-HPC.2014.10
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we investigate the implementation of the Block Conjugate Gradient (BCG) algorithm on CPU-GPU processors. By analyzing the performance of various matrix operations in BCG, we identify the main performance bottleneck in constructing new search direction matrices. Replacing the QR decomposition by eigendecomposition of a small matrix remedies the problem by reducing the computational cost of generating orthogonal search directions. Moreover, a hybrid (offload) computing scheme is designed to enables the BCG implementation to handle linear systems with large, sparse coefficient matrices that cannot fit in the GPU memory. The hybrid scheme offloads matrix operations to GPU processors while helps hide the CPU-GPU memory transaction overhead. We compare the performance of our BCG implementation with the one on CPU with Intel Xeon Phi coprocessors using the automatic offload mode. With sufficient number of right hand sides, the CPU-GPU implementation of BCG can reach speedup of 2.61 over the CPU-only implementation, which is significantly higher than that of the CPU-Intel Xeon Phi implementation.
引用
收藏
页码:72 / 77
页数:6
相关论文
共 39 条
  • [1] Hybrid CPU-GPU implementation of the transformed spatial domain channel estimation algorithm for mmWave MIMO systems
    Lloria, Diego
    Aviles, Pablo M.
    Belloch, Jose A.
    Roger, Sandra
    Botella-Mascarell, Carmen
    Lindoso, Almudena
    JOURNAL OF SUPERCOMPUTING, 2023, 79 (09) : 9371 - 9382
  • [2] A survey on techniques for cooperative CPU-GPU computing
    Raju, K.
    Chiplunkar, Niranjan N.
    SUSTAINABLE COMPUTING-INFORMATICS & SYSTEMS, 2018, 19 : 72 - 85
  • [3] A multi-grained distributed implementation of the parallel Block Conjugate Gradient algorithm
    Murli, A.
    D'Amore, L.
    Laccetti, G.
    Gregoretti, F.
    Oliva, G.
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2010, 22 (15) : 2053 - 2072
  • [4] CPU-GPU Hybrid Parallel Binomial American Option Pricing
    Zhang, Nan
    Lim, Eng Gee
    Man, Ka Lok
    Lei, Chi-Un
    INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTIST, IMECS 2012, VOL II, 2012, : 1157 - 1162
  • [5] Exploring Query Processing on CPU-GPU Integrated Edge Device
    Liu, Jiesong
    Zhang, Feng
    Li, Hourun
    Wang, Dalin
    Wan, Weitao
    Fang, Xiaokun
    Zhai, Jidong
    Du, Xiaoyong
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (12) : 4057 - 4070
  • [6] Binomial American Option Pricing on CPU-GPU Hetergenous System
    Zhang, Nan
    Lei, Chi-Un
    Man, Ka Lok
    ENGINEERING LETTERS, 2012, 20 (03) : 279 - 285
  • [7] Fast Snippet Generation Based On CPU-GPU Hybrid System
    Liu, Ding
    Li, Ruixuan
    Gu, Xiwu
    Wen, Kunmei
    He, Heng
    Gao, Guoqiang
    2011 IEEE 17TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2011, : 252 - 259
  • [8] Hybrid-Smash: A Heterogeneous CPU-GPU Compression Library
    Penaranda, Cristian
    Reano, Carlos
    Silla, Federico
    IEEE ACCESS, 2024, 12 : 32706 - 32723
  • [9] PARALLEL BINOMIAL AMERICAN OPTION PRICING ON CPU-GPU HYBRID PLATFORM
    Zhang, Nan
    Lei, Chi-Un
    Man, Ka Lok
    IAENG TRANSACTIONS ON ELECTRICAL ENGINEERING, VOL 1, 2012, : 161 - 174
  • [10] CoopCL: Cooperative Execution of OpenCL Programs on Heterogeneous CPU-GPU Platforms
    Moren, Konrad
    Goehringer, Diana
    2020 28TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2020), 2020, : 224 - 231