Revisiting Multi-pass Scatter and Gather on GPUs

被引:5
|
作者
Lai, Zhuohang [1 ]
Luo, Qiong [1 ]
Jia, Xiaoying [2 ]
机构
[1] Hong Kong Univ Sci & Technol, Kowloon, Hong Kong, Peoples R China
[2] Nvidia Corp, Beijing, Peoples R China
关键词
GPU; Virtual Memory Addressing; TLB; Irregular Memory Access; Data-parallel Primitives; TRANSLATION;
D O I
10.1145/3225058.3225095
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Scatter and gather are two essential data-parallel primitives for memory-intensive applications. The performance challenge is in their irregular memory access patterns, especially on architectures with high memory latency, such as GPUs. Previous work has proposed multi-pass scatter and gather schemes to optimize their performance on earlier GPUs; on newer GPUs, nevertheless, anecdotal evidence showed that such schemes had little performance benefit on small datasets, and few studies have been conducted on larger datasets. Therefore, we propose a systematic study to re-evaluate the performance of multi-pass scatter and gather on three newer GPUs with various data sizes. Specifically, we micro-benchmark the undocumented Translation Lookaside Buffers (TLBs) on these GPUs to quantitatively analyze their performance impact. We then develop an analytical model to analyze the execution of irregular memory accesses and estimate the multi-pass performance. Our evaluation on the newer GPUs shows that (1) TLB caching can affect the performance of irregular memory accesses more significantly than data caching; (2) on datasets larger than the L3 TLB size, the multi-pass schemes, with a suitable number of passes, can reduce up to 87.8% of the execution time over the single-pass version due to better TLB locality. Our model can predict the multipass performance on various GPUs, with an average accuracy of 92.9%. It can further suggest a suitable number of passes for the best performance.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Evaluating Gather and Scatter Performance on CPUs and GPUs
    Lavin, Patrick
    Young, Jeffrey
    Vuduc, Richard
    Riedy, Jason
    Vose, Aaron
    Ernst, Daniel
    PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON MEMORY SYSTEMS, MEMSYS 2020, 2020, : 209 - 222
  • [2] Multi-pass microscopy
    Thomas Juffmann
    Brannon B. Klopfer
    Timmo L.I. Frankort
    Philipp Haslinger
    Mark A. Kasevich
    Nature Communications, 7
  • [3] Multi-pass microscopy
    Juffmann, Thomas
    Klopfer, Brannon B.
    Frankort, Timmo L. I.
    Haslinger, Philipp
    Kasevich, Mark A.
    NATURE COMMUNICATIONS, 2016, 7
  • [4] Multi-pass pronunciation adaptation
    Bodenstab, Nathan
    Fanty, Mark
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 865 - +
  • [5] Aquablation: Multi-Pass Impact
    Bach, T.
    Kaplan, S. A.
    EUROPEAN UROLOGY, 2022, 81 : S1315 - S1315
  • [6] Multi-pass geometric algorithms
    Chan, Timothy M.
    Chen, Eric Y.
    DISCRETE & COMPUTATIONAL GEOMETRY, 2007, 37 (01) : 79 - 102
  • [7] Multi-Pass Geometric Algorithms
    Timothy M. Chan
    Eric Y. Chen
    Discrete & Computational Geometry, 2007, 37 : 79 - 102
  • [8] Multi-pass spectroscopic ellipsometry
    Stehle, Jean-Louis
    Samartzis, Peter C.
    Stamataki, Katerina
    Piel, Jean-Philippe
    Katsoprinakis, George E.
    Papadakis, Vassilis
    Schimowski, Xavier
    Rakitzis, T. Peter
    Loppinet, Benoit
    THIN SOLID FILMS, 2014, 555 : 143 - 147
  • [9] AQUABLATION: MULTI-PASS IMPACT
    Bach, Thorsten
    Kaplan, Steven
    JOURNAL OF UROLOGY, 2022, 207 (05): : E472 - E472
  • [10] Optimization of multi-pass turning and multi-pass face milling using subpopulation firefly algorithm
    Miodragovic, Goran R.
    Dordevic, Violeta
    Bulatovic, Radovan R.
    Petrovic, Aleksandra
    PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART C-JOURNAL OF MECHANICAL ENGINEERING SCIENCE, 2019, 233 (05) : 1520 - 1540