Revisiting Multi-pass Scatter and Gather on GPUs

被引:5
|
作者
Lai, Zhuohang [1 ]
Luo, Qiong [1 ]
Jia, Xiaoying [2 ]
机构
[1] Hong Kong Univ Sci & Technol, Kowloon, Hong Kong, Peoples R China
[2] Nvidia Corp, Beijing, Peoples R China
关键词
GPU; Virtual Memory Addressing; TLB; Irregular Memory Access; Data-parallel Primitives; TRANSLATION;
D O I
10.1145/3225058.3225095
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Scatter and gather are two essential data-parallel primitives for memory-intensive applications. The performance challenge is in their irregular memory access patterns, especially on architectures with high memory latency, such as GPUs. Previous work has proposed multi-pass scatter and gather schemes to optimize their performance on earlier GPUs; on newer GPUs, nevertheless, anecdotal evidence showed that such schemes had little performance benefit on small datasets, and few studies have been conducted on larger datasets. Therefore, we propose a systematic study to re-evaluate the performance of multi-pass scatter and gather on three newer GPUs with various data sizes. Specifically, we micro-benchmark the undocumented Translation Lookaside Buffers (TLBs) on these GPUs to quantitatively analyze their performance impact. We then develop an analytical model to analyze the execution of irregular memory accesses and estimate the multi-pass performance. Our evaluation on the newer GPUs shows that (1) TLB caching can affect the performance of irregular memory accesses more significantly than data caching; (2) on datasets larger than the L3 TLB size, the multi-pass schemes, with a suitable number of passes, can reduce up to 87.8% of the execution time over the single-pass version due to better TLB locality. Our model can predict the multipass performance on various GPUs, with an average accuracy of 92.9%. It can further suggest a suitable number of passes for the best performance.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Geometrical influences on multi-pass laser forming
    Edwardson, SP
    Abed, E
    Bartkowiak, K
    Dearden, G
    Watkins, KG
    JOURNAL OF PHYSICS D-APPLIED PHYSICS, 2006, 39 (02) : 382 - 389
  • [22] Matrix multi-pass scheme disk amplifier
    Perevezentsev, Evgeny
    Kuznetsov, Ivan
    Mukhin, Ivan
    Palashov, Oleg V.
    APPLIED OPTICS, 2017, 56 (30) : 8471 - 8476
  • [23] Nonlinear pulse compression in a multi-pass cell
    Schulte, Jan
    Sartorius, Thomas
    Weitenberg, Johannes
    Vernaleken, Andreas
    Russbueldt, Peter
    OPTICS LETTERS, 2016, 41 (19) : 4511 - 4514
  • [24] A multi-pass method for accelerated spectral sampling
    van de Ruit, M.
    Eisemann, E.
    COMPUTER GRAPHICS FORUM, 2021, 40 (07) : 141 - 148
  • [25] Event Coreference Resolution with Multi-Pass Sieves
    Lu, Jing
    Ng, Vincent
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 3996 - 4003
  • [26] Simulations of transformation kinetics in a multi-pass weld
    Rojko, D
    Gliha, V
    MATERIALS AND MANUFACTURING PROCESSES, 2005, 20 (05) : 833 - 849
  • [27] Results of multi-pass nasolacrimal duct probing
    Wright, KW
    Mocan, MC
    Najera-Covarrubias, M
    Suarez, N
    AT THE CROSSINGS: PEDIATRIC OPHTHALMOLOGY AND STRABISMUS, 2004, : 251 - 255
  • [28] Multi-Pass Stamping Forming a Concave Ring
    Zhang, Song
    Shu, Xuedao
    Shi, Jianan
    Li, Zixuan
    APPLIED SCIENCES-BASEL, 2020, 10 (18):
  • [29] The problem of inhomogeneity in multi-pass drawing process
    Luksza, J
    Majta, J
    Skolyszewski, A
    Bator, A
    METAL FORMING 2000, 2000, : 589 - 596
  • [30] Multi-pass model based artistic rendering
    Mi, Xiao-Feng
    Chen, Xue-Song
    Tang, Min
    Dong, Jin-Xiang
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2003, 37 (06): : 664 - 669