SemCache plus plus : Semantics-Aware Caching for Efficient Multi-GPU Offloading

被引：0

作者：

Al-Saber, Nabeel ^{[1
]}

Kulkarni, Milind ^{[1
]}

机构：

[1] Purdue Univ, Sch Elect & Comp Engn, W Lafayette, IN 47907 USA

来源：

ACM SIGPLAN NOTICES | 2015年 / 50卷 / 08期

关键词：

Multi-GPU offloading; GPGPU; Communication optimization;

D O I：

10.1145/2688500.2688527

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Offloading computations to multiple GPUs is not an easy task. It requires decomposing data, distributing computations and handling communication manually. GPU libraries have made it easy to offload computations to multiple GPUs by hiding this complexity inside library calls. Such encapsulation prevents the reuse of the data between successive kernel invocations resulting in redundant communication. In this work, we introduce SemCache++, a semantics-aware GPU cache that automatically manages communication between the CPU and multiple GPUs in addition to optimizing communication by eliminating redundant transfers using caching. SemCache++ is used to build the first multi-GPU drop-in replacement library that (a) uses the virtual memory to automatically manage and optimize multi-GPU communication and (b) requires no program rewriting or annotations. Our caching technique is efficient; it uses a two level caching directory to track matrices and submatrices. Experimental results show that our system can eliminate redundant communication and deliver significant performance improvements over multi-GPU libraries like CUBLASXT.

引用

页码：255 / 256

页数：2

共 7 条

[1] SemCache plus plus : Semantics-Aware Caching for Efficient Multi-GPU Offloading
Al-Saber, Nabeel
Kulkarni, Milind
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS'15), 2015, : 79 - 88
[2] Locality-aware Optimizations for Improving Remote Memory Latency in Multi-GPU Systems
Belayneh, Leul
Ye, Haojie
Chen, Kuan-Yu
Blaauw, David
Mudge, Trevor
Dreslinski, Ronald
Talati, Nishil
PROCEEDINGS OF THE 2022 31ST INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PACT 2022, 2022, : 304 - 316
[3] NUMA-Aware Data-Transfer Measurements for Power/NVLink Multi-GPU Systems
Pearson, Carl
Chung, I-Hsin
Sura, Zehra
Hwu, Wen-Mei
Xiong, Jinjun
HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2018, 2018, 11203 : 448 - 454
[4] MULTI-GPU DGEMM AND HIGH PERFORMANCE LINPACK ON HIGHLY ENERGY-EFFICIENT CLUSTERS
Rohr, David
Bach, Matthias
Kretz, Matthias
Lindenstruth, Volker
IEEE MICRO, 2011, 31 (05) : 18 - 26
[5] Efficient Multi-GPU Shared Memory via Automatic Optimization of Fine-Grained Transfers
Muthukrishnan, Harini
Nellans, David
Lustig, Daniel
Fessler, Jeffrey A.
Wenisch, Thomas F.
2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021), 2021, : 139 - 152
[6] Optimizing seam carving on multi-GPU systems for real-time content-aware image resizing
Ikjoon Kim
Jidong Zhai
Yan Li
Wenguang Chen
The Journal of Supercomputing, 2015, 71 : 3500 - 3524
[7] Optimizing seam carving on multi-GPU systems for real-time content-aware image resizing
Kim, Ikjoon
Zhai, Jidong
Li, Yan
Chen, Wenguang
JOURNAL OF SUPERCOMPUTING, 2015, 71 (09) : 3500 - 3524

← 1 →