Revisiting Multi-pass Scatter and Gather on GPUs

被引:5
|
作者
Lai, Zhuohang [1 ]
Luo, Qiong [1 ]
Jia, Xiaoying [2 ]
机构
[1] Hong Kong Univ Sci & Technol, Kowloon, Hong Kong, Peoples R China
[2] Nvidia Corp, Beijing, Peoples R China
关键词
GPU; Virtual Memory Addressing; TLB; Irregular Memory Access; Data-parallel Primitives; TRANSLATION;
D O I
10.1145/3225058.3225095
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Scatter and gather are two essential data-parallel primitives for memory-intensive applications. The performance challenge is in their irregular memory access patterns, especially on architectures with high memory latency, such as GPUs. Previous work has proposed multi-pass scatter and gather schemes to optimize their performance on earlier GPUs; on newer GPUs, nevertheless, anecdotal evidence showed that such schemes had little performance benefit on small datasets, and few studies have been conducted on larger datasets. Therefore, we propose a systematic study to re-evaluate the performance of multi-pass scatter and gather on three newer GPUs with various data sizes. Specifically, we micro-benchmark the undocumented Translation Lookaside Buffers (TLBs) on these GPUs to quantitatively analyze their performance impact. We then develop an analytical model to analyze the execution of irregular memory accesses and estimate the multi-pass performance. Our evaluation on the newer GPUs shows that (1) TLB caching can affect the performance of irregular memory accesses more significantly than data caching; (2) on datasets larger than the L3 TLB size, the multi-pass schemes, with a suitable number of passes, can reduce up to 87.8% of the execution time over the single-pass version due to better TLB locality. Our model can predict the multipass performance on various GPUs, with an average accuracy of 92.9%. It can further suggest a suitable number of passes for the best performance.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Randomized multi-pass streaming skyline algorithms
    Sarma, Atish Das
    Lall, Ashwin
    Nanongkai, Danupon
    Xu, Jun
    Proceedings of the VLDB Endowment, 2009, 2 (01): : 85 - 96
  • [32] A Multi-pass Sieve for Clinical Concept Normalization
    Wang, Yuxia
    Hur, Brian
    Verspoor, Karin
    Baldwin, Timothy
    TRAITEMENT AUTOMATIQUE DES LANGUES, 2020, 61 (02): : 41 - 65
  • [33] An evolutionary approach for multi-pass turning operations
    Singh, G.
    Choudhary, A. K.
    Karunakaran, K. P.
    Tiwari, M. K.
    PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART B-JOURNAL OF ENGINEERING MANUFACTURE, 2006, 220 (02) : 145 - 162
  • [34] A Multi-Pass Generation of DEM for Urban Planning
    Cui, Zheng
    Zhang, Keqi
    Zhang, Chengcui
    Chen, Shu-Ching
    2013 INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA (CLOUDCOM-ASIA), 2013, : 543 - 548
  • [35] Experimental study of asymmetric multi-pass spinning
    Yong Xiao
    Zhiren Han
    Shuyang Zhou
    Zhen Jia
    The International Journal of Advanced Manufacturing Technology, 2020, 110 : 667 - 679
  • [36] LAMELLAR TEARING IN MULTI-PASS FILLET JOINTS
    ELLIOTT, DN
    WELDING JOURNAL, 1969, 48 (09) : S409 - &
  • [37] An experimental study on a cylindrical multi-pass cell
    Tonomura, M
    Miyazawa, H
    Nakamura, T
    Endo, M
    Yamaguchi, S
    Nanri, K
    Fujioka, T
    2005 PACIFIC RIM CONFERENCE ON LASERS AND ELECTRO-OPTICS, 2005, : 861 - 862
  • [38] Hyperspectral multi-pass mapping for target detection
    Schaum, A
    Stocker, A
    ALGORITHMS AND TECHNOLOGIES FOR MULTISPECTRAL, HYPERSPECTRAL AND ULTRASPECTRAL IMAGERY IX, 2003, 5093 : 1 - 8
  • [39] Grain refinement of HAZ in multi-pass welding
    Ma, R.
    Fang, K.
    Yang, J. G.
    Liu, X. S.
    Fang, H. Y.
    JOURNAL OF MATERIALS PROCESSING TECHNOLOGY, 2014, 214 (05) : 1131 - 1135
  • [40] MAP equalization for DQPSK in multi-pass demodulation
    Khayrallah, AS
    Fulghum, T
    Hui, D
    IEEE VEHICULAR TECHNOLOGY CONFERENCE, FALL 2000, VOLS 1-6, PROCEEDINGS: BRINGING GLOBAL MOBILITY TO THE NETWORK AGE, 2000, : 2249 - 2256