Revisiting Multi-pass Scatter and Gather on GPUs

被引：5

作者：

Lai, Zhuohang ^{[1
]}

Luo, Qiong ^{[1
]}

Jia, Xiaoying ^{[2
]}

机构：

[1] Hong Kong Univ Sci & Technol, Kowloon, Hong Kong, Peoples R China

[2] Nvidia Corp, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 47TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING | 2018年

关键词：

GPU; Virtual Memory Addressing; TLB; Irregular Memory Access; Data-parallel Primitives; TRANSLATION;

D O I：

10.1145/3225058.3225095

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Scatter and gather are two essential data-parallel primitives for memory-intensive applications. The performance challenge is in their irregular memory access patterns, especially on architectures with high memory latency, such as GPUs. Previous work has proposed multi-pass scatter and gather schemes to optimize their performance on earlier GPUs; on newer GPUs, nevertheless, anecdotal evidence showed that such schemes had little performance benefit on small datasets, and few studies have been conducted on larger datasets. Therefore, we propose a systematic study to re-evaluate the performance of multi-pass scatter and gather on three newer GPUs with various data sizes. Specifically, we micro-benchmark the undocumented Translation Lookaside Buffers (TLBs) on these GPUs to quantitatively analyze their performance impact. We then develop an analytical model to analyze the execution of irregular memory accesses and estimate the multi-pass performance. Our evaluation on the newer GPUs shows that (1) TLB caching can affect the performance of irregular memory accesses more significantly than data caching; (2) on datasets larger than the L3 TLB size, the multi-pass schemes, with a suitable number of passes, can reduce up to 87.8% of the execution time over the single-pass version due to better TLB locality. Our model can predict the multipass performance on various GPUs, with an average accuracy of 92.9%. It can further suggest a suitable number of passes for the best performance.

引用

页数：11

共 50 条

[11] Effect of microstructural heterogeneities on scatter of toughness in multi-pass weld metal of C-Mn steels
Song, H. Y.
Evans, G. M.
Babu, S. S.
SCIENCE AND TECHNOLOGY OF WELDING AND JOINING, 2014, 19 (05) : 376 - 384
[12] Preventing Maldistribution in Multi-Pass Trays
Kister, Henry Z.
Dionne, Richard
Stupin, Walter J.
Olsson, Matthew R.
CHEMICAL ENGINEERING PROGRESS, 2010, 106 (04) : 32 - 41
[13] MECHANICAL PROPERTIES OF MULTI-PASS WELDS
KOLYAKIN, NN
ERSHOV, VA
UDOVICHE.IP
AUTOMATIC WELDING USSR, 1966, 19 (10): : 60 - &
[14] Interactive multi-pass programmable shading
Peercy, MS
Olano, M
Airey, J
Ungar, J
SIGGRAPH 2000 CONFERENCE PROCEEDINGS, 2000, : 425 - 432
[15] A robust multi-pass printing method
Nino, Cesar L.
Keane, T. Roger, III
NIP 23: 23RD INTERNATIONAL CONFERENCE ON DIGITAL PRINTING TECHNOLOGIES, TECHNICAL PROGRAM AND PROCEEDINGS/DIGITAL FABRICATION 2007, 2007, : 781 - 786
[16] Simulation of multi-pass rolling processes
Engelmann, B
Schrank, M
Grimes, W
Farrugia, D
SIMULATION OF MATERIALS PROCESSING: THEORY, METHODS AND APPLICATIONS, 2001, : 543 - 548
[17] Multi-pass transmission electron microscopy
Juffmann, Thomas
Koppell, Stewart A.
Klopfer, Brannon B.
Ophus, Colin
Glaeser, Robert M.
Kasevich, Mark A.
SCIENTIFIC REPORTS, 2017, 7
[18] Multi-pass transmission electron microscopy
Thomas Juffmann
Stewart A. Koppell
Brannon B. Klopfer
Colin Ophus
Robert M. Glaeser
Mark A. Kasevich
Scientific Reports, 7
[19] A Multi-Pass Sieve for Name Normalization
D'Souza, Jennifer
PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 4150 - 4151
[20] Scatter-and-Gather Revisited: High-Performance Side-Channel-Resistant AES on GPUs
Lin, Zhen
Mathur, Utkarsh
Zhou, Huiyang
12TH WORKSHOP ON GENERAL PURPOSE PROCESSING USING GPUS (GPGPU 12), 2019, : 2 - 11

← 1 2 3 4 5 →