Efficient GPU Implementation of Affine Index Permutations on Arrays

被引:0
|
作者
Bouverot-Dupuis, Mathis [1 ]
Sheeran, Mary [2 ]
机构
[1] ENS Paris, Paris, France
[2] Chalmers Univ, Gothenburg, Sweden
来源
PROCEEDINGS OF THE 11TH ACM SIGPLAN INTERNATIONAL WORKSHOP ON FUNCTIONAL HIGH-PERFORMANCE AND NUMERICAL COMPUTING, FHPNC 2023 | 2023年
基金
瑞典研究理事会;
关键词
GPU; data-parallelism; functional languages; ALGORITHMS;
D O I
10.1145/3609024.3609411
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Optimal usage of the memory system is a key element of fast GPU algorithms. Unfortunately many common algorithms fail in this regard despite exhibiting great regularity in memory access patterns. In this paper we propose efficient kernels to permute the elements of an array. We handle a class of permutations known as Bit Matrix Multiply Complement (BMMC) permutations, for which we design kernels of speed comparable to that of a simple array copy. This is a first step towards implementing a set of array combinators based on these permutations.
引用
收藏
页码:15 / 28
页数:14
相关论文
共 50 条
  • [21] A Power Efficient Neural Network Implementation on Heterogeneous FPGA and GPU Devices
    Tu, Yuexuan
    Sadiq, Saad
    Tao, Yudong
    Shyu, Mei-Ling
    Chen, Shu-Ching
    2019 IEEE 20TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE (IRI 2019), 2019, : 193 - 199
  • [22] An Efficient Implementation of the Bellman-Ford Algorithm for Kepler GPU Architectures
    Busato, Federico
    Bombieri, Nicola
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2016, 27 (08) : 2222 - 2233
  • [23] Efficient GPU Implementation of Genetic Algorithm to Solve the Traveling Salesman Problem
    Kidwell, Adam
    Fillmore, Alex
    Alawneh, Shadi
    SOUTHEASTCON 2024, 2024, : 44 - 49
  • [24] Efficient implementation of Sobel edge detection algorithm on CPU, GPU and FPGA
    Chouchene, M. (ch.marwa.84@gmail.com), 1600, Inderscience Enterprises Ltd. (05): : 105 - 117
  • [25] An Efficient GPU Implementation of Ant Colony Optimization for the Traveling Salesman Problem
    Uchida, Akihiro
    Ito, Yasuaki
    Nakano, Koji
    2012 THIRD INTERNATIONAL CONFERENCE ON NETWORKING AND COMPUTING (ICNC 2012), 2012, : 94 - 102
  • [26] An Efficient Transaction-based GPU Implementation of Minimum Spanning Forest Algorithm
    Manoochehri, Shayan
    Goodarzi, Bahareh
    Goswami, Dhrubajyoti
    2017 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2017, : 643 - 650
  • [27] Efficient GPU Implementation of Multiple-Precision Addition based on Residue Arithmetic
    Isupov, Konstantin
    Knyazkov, Vladimir
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (09) : 1 - 8
  • [28] Efficient Implementation of Quantum Materials Simulations on Distributed CPU-GPU Systems
    Solca, Raffaele
    Kozhevnikov, Anton
    Haidar, Azzam
    Tomov, Stanimire
    Dongarra, Jack
    Schulthess, Thomas C.
    PROCEEDINGS OF SC15: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2015,
  • [29] Implementation of Parallel Sparse Cholesky Factorization on GPU
    Zou, Dan
    Dou, Yong
    PROCEEDINGS OF 2012 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2012), 2012, : 2228 - 2232
  • [30] Efficient Implementation of BCH Decoders on GPU for Flash Memory Devices using iBMA
    Subbiah, Arul K.
    Ogunfunmi, Tokunbo
    2016 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2016,