SemCache plus plus : Semantics-Aware Caching for Efficient Multi-GPU Offloading

被引:0
|
作者
Al-Saber, Nabeel [1 ]
Kulkarni, Milind [1 ]
机构
[1] Purdue Univ, Sch Elect & Comp Engn, W Lafayette, IN 47907 USA
关键词
Multi-GPU offloading; GPGPU; Communication optimization;
D O I
10.1145/2688500.2688527
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Offloading computations to multiple GPUs is not an easy task. It requires decomposing data, distributing computations and handling communication manually. GPU libraries have made it easy to offload computations to multiple GPUs by hiding this complexity inside library calls. Such encapsulation prevents the reuse of the data between successive kernel invocations resulting in redundant communication. In this work, we introduce SemCache++, a semantics-aware GPU cache that automatically manages communication between the CPU and multiple GPUs in addition to optimizing communication by eliminating redundant transfers using caching. SemCache++ is used to build the first multi-GPU drop-in replacement library that (a) uses the virtual memory to automatically manage and optimize multi-GPU communication and (b) requires no program rewriting or annotations. Our caching technique is efficient; it uses a two level caching directory to track matrices and submatrices. Experimental results show that our system can eliminate redundant communication and deliver significant performance improvements over multi-GPU libraries like CUBLASXT.
引用
收藏
页码:255 / 256
页数:2
相关论文
共 7 条
  • [1] SemCache plus plus : Semantics-Aware Caching for Efficient Multi-GPU Offloading
    Al-Saber, Nabeel
    Kulkarni, Milind
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS'15), 2015, : 79 - 88
  • [2] Locality-aware Optimizations for Improving Remote Memory Latency in Multi-GPU Systems
    Belayneh, Leul
    Ye, Haojie
    Chen, Kuan-Yu
    Blaauw, David
    Mudge, Trevor
    Dreslinski, Ronald
    Talati, Nishil
    PROCEEDINGS OF THE 2022 31ST INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PACT 2022, 2022, : 304 - 316
  • [3] NUMA-Aware Data-Transfer Measurements for Power/NVLink Multi-GPU Systems
    Pearson, Carl
    Chung, I-Hsin
    Sura, Zehra
    Hwu, Wen-Mei
    Xiong, Jinjun
    HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2018, 2018, 11203 : 448 - 454
  • [4] MULTI-GPU DGEMM AND HIGH PERFORMANCE LINPACK ON HIGHLY ENERGY-EFFICIENT CLUSTERS
    Rohr, David
    Bach, Matthias
    Kretz, Matthias
    Lindenstruth, Volker
    IEEE MICRO, 2011, 31 (05) : 18 - 26
  • [5] Efficient Multi-GPU Shared Memory via Automatic Optimization of Fine-Grained Transfers
    Muthukrishnan, Harini
    Nellans, David
    Lustig, Daniel
    Fessler, Jeffrey A.
    Wenisch, Thomas F.
    2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021), 2021, : 139 - 152
  • [6] Optimizing seam carving on multi-GPU systems for real-time content-aware image resizing
    Ikjoon Kim
    Jidong Zhai
    Yan Li
    Wenguang Chen
    The Journal of Supercomputing, 2015, 71 : 3500 - 3524
  • [7] Optimizing seam carving on multi-GPU systems for real-time content-aware image resizing
    Kim, Ikjoon
    Zhai, Jidong
    Li, Yan
    Chen, Wenguang
    JOURNAL OF SUPERCOMPUTING, 2015, 71 (09) : 3500 - 3524