A compression-based memory-efficient optimization for out-of-core GPU stencil computation

被引:3
作者
Shen, Jingcheng [1 ]
Long, Linbo [1 ]
Deng, Xin [1 ]
Okita, Masao [2 ]
Ino, Fumihiko [2 ]
机构
[1] Chongqing Univ Posts & Telecommun, Coll Comp Sci & Technol, 2 Chongwen Rd, Chongqing 400065, Peoples R China
[2] Osaka Univ, Grad Sch Informat Sci & Technol, 1-5 Yamadaoka, Suita, Osaka 5650871, Japan
基金
中国国家自然科学基金; 日本学术振兴会;
关键词
On-the-fly compression; Stencil computation; Out-of-core; GPU; LOSSY COMPRESSION; ALGORITHM;
D O I
10.1007/s11227-023-05103-8
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
A code for out-of-core stencil computation manages data that exceeds the memory capacity of a GPU. However, such a code necessitates frequent data transfers between the CPU and GPU, which often impede overall performance. In this work, we propose a compression-based, memory-efficient method to accelerate out-of-core stencil codes. First, an on-the-fly compression technique is integrated into the out of-core computation to reduce CPU-GPU data transfers. Secondly, a single-working buffer strategy is employed to reduce the GPU memory usage, enabling more data to be stored on the GPU for reuse, resulting in increased temporal blocking steps. Experimental results demonstrate that the proposed method significantly reduces the GPU memory usage by 21%, thereby creating space for doubling the number of temporal blocking steps compared to the codes without compression. Our proposed method has shown to help the high-order, data-transfer-bound stencil codes achieve speedups up to 2.09x for single-precision floating-point format and up to 1.92x for double-precision floating-point format on an NVIDIA Tesla V100 GPU in comparison with the codes without compression.
引用
收藏
页码:11055 / 11077
页数:23
相关论文
共 45 条
  • [1] Adams S, 2007, PROCEEDINGS OF THE HPCMP USERS GROUP CONFERENCE 2007, P334
  • [2] An out-of-core GPU approach for accelerating geostatistical interpolation
    Allombert, Victor
    Michea, David
    Dupros, Fabrice
    Bellier, Christian
    Bourgine, Bernard
    Aochi, Hideo
    Jubertie, Sylvain
    [J]. 2014 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, 2014, 29 : 888 - 896
  • [3] Task offloading using GPU-based particle swarm optimization for high-performance vehicular edge computing
    Alqarni, Mohamed A.
    Mousa, Mohamed H.
    Hussein, Mohamed K.
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (10) : 10356 - 10364
  • [4] Exploring the feasibility of lossy compression for PDE simulations
    Calhoun, Jon
    Cappello, Franck
    Olson, Luke N.
    Snir, Marc
    Gropp, William D.
    [J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2019, 33 (02) : 397 - 410
  • [5] Cappello F, 2020, SMOKY MOUNTAINS COMP, P99, DOI DOI 10.1007/978-3
  • [6] Accelerating Tensor Swapping in GPUs With Self-Tuning Compression
    Chen, Ping
    He, Shuibing
    Zhang, Xuechen
    Chen, Shuaiben
    Hong, Peiyi
    Yin, Yanlong
    Sun, Xian-He
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (12) : 4484 - 4498
  • [7] Efficient Lossy Compression for Scientific Data Based on Pointwise Relative Error Bound
    Di, Sheng
    Tao, Dingwen
    Liang, Xin
    Cappello, Franck
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (02) : 331 - 345
  • [8] Farres A, 2019, 81 EAGE C EXHIBITION, V2019, P1
  • [9] Parallel border tracking in binary images using GPUs
    Garcia-Molla, Victor M.
    Alonso-Jorda, Pedro
    Garcia-Laguia, Ricardo
    [J]. JOURNAL OF SUPERCOMPUTING, 2022, 78 (07) : 9817 - 9839
  • [10] Taskflow: A General-Purpose Parallel and Heterogeneous Task Programming System
    Huang, Tsung-Wei
    Lin, Dian-Lun
    Lin, Yibo
    Lin, Chun-Xun
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (05) : 1448 - 1452