Efficient GPU Implementation of Affine Index Permutations on Arrays

被引:0
作者
Bouverot-Dupuis, Mathis [1 ]
Sheeran, Mary [2 ]
机构
[1] ENS Paris, Paris, France
[2] Chalmers Univ, Gothenburg, Sweden
来源
PROCEEDINGS OF THE 11TH ACM SIGPLAN INTERNATIONAL WORKSHOP ON FUNCTIONAL HIGH-PERFORMANCE AND NUMERICAL COMPUTING, FHPNC 2023 | 2023年
基金
瑞典研究理事会;
关键词
GPU; data-parallelism; functional languages; ALGORITHMS;
D O I
10.1145/3609024.3609411
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Optimal usage of the memory system is a key element of fast GPU algorithms. Unfortunately many common algorithms fail in this regard despite exhibiting great regularity in memory access patterns. In this paper we propose efficient kernels to permute the elements of an array. We handle a class of permutations known as Bit Matrix Multiply Complement (BMMC) permutations, for which we design kernels of speed comparable to that of a simple array copy. This is a first step towards implementing a set of array combinators based on these permutations.
引用
收藏
页码:15 / 28
页数:14
相关论文
共 50 条
  • [41] Efficient Implementation for MD5-RC4 Encryption Using GPU with CUDA
    Li, Changxin
    Wu, Hongwei
    Chen, Shifeng
    Li, Xiaochao
    Guo, Donghui
    [J]. PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON ANTI-COUNTERFEITING, SECURITY, AND IDENTIFICATION IN COMMUNICATION, 2009, : 167 - +
  • [42] Fused GEMMs towards an efficient GPU implementation of the ADER-DG method in SeisSol
    Dorozhinskii, Ravil
    Gadeschi, Gonzalo Brito
    Bader, Michael
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2024, 36 (12)
  • [43] An Efficient Graph Isomorphism Algorithm Based on Canonical Labeling and Its Parallel Implementation on GPU
    Wang, Renda
    Guo, Longjiang
    Ai, Chunyu
    Li, Jinbao
    Ren, Meirui
    Li, Keqin
    [J]. 2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 1089 - 1096
  • [44] An Efficient Computation of Dempster-Shafer Theory of Evidence Based on Native GPU Implementation
    Rico, Noelia
    Troiano, Luigi
    Diaz, Irene
    [J]. BELIEF FUNCTIONS: THEORY AND APPLICATIONS (BELIEF 2021), 2021, 12915 : 291 - 299
  • [45] Computationally Efficient Implementation of Joint Detection and Parameters Estimation of Signals with Dispersive Distortions on a GPU
    Lipatkin, Vladislav I.
    Lobov, Evgeniy M.
    Kandaurov, Nikolai A.
    [J]. SENSORS, 2022, 22 (09)
  • [46] Efficient GPU Implementation of 3D Spectral Domain Synthetic Aperture Imaging
    Lewandowski, Marcin
    Jarosik, Piotr
    Tasinkevych, Yuriy
    Walczak, Mateusz
    [J]. PROCEEDINGS OF THE 2020 IEEE INTERNATIONAL ULTRASONICS SYMPOSIUM (IUS), 2020,
  • [47] A new theoretical derivation of NFFT and its implementation on GPU
    Yang, Sheng-Chun
    Qian, Hu-Jun
    Lu, Zhong-Yuan
    [J]. APPLIED AND COMPUTATIONAL HARMONIC ANALYSIS, 2018, 44 (02) : 273 - 293
  • [48] Improved GPU Implementation of RainbowCrack
    Tabata, Yuki
    Iwai, Keisuke
    Tanaka, Hidema
    Kurokawa, Takakazu
    [J]. PROCEEDINGS OF 2015 THIRD INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), 2015, : 616 - 618
  • [49] GPU Implementation of JPEG XR
    Che, Ming-Chao
    Liang, Jie
    [J]. VISUAL INFORMATION PROCESSING AND COMMUNICATION, 2010, 7543
  • [50] SIFT implementation based on GPU
    Jiang Chao
    Geng Ze-xun
    Wei Xiao-feng
    Shen Chen
    [J]. INTERNATIONAL SYMPOSIUM ON PHOTOELECTRONIC DETECTION AND IMAGING 2013: OPTICAL STORAGE AND DISPLAY TECHNOLOGY, 2013, 8913