Efficient GPU Implementation of Affine Index Permutations on Arrays

被引:0
作者
Bouverot-Dupuis, Mathis [1 ]
Sheeran, Mary [2 ]
机构
[1] ENS Paris, Paris, France
[2] Chalmers Univ, Gothenburg, Sweden
来源
PROCEEDINGS OF THE 11TH ACM SIGPLAN INTERNATIONAL WORKSHOP ON FUNCTIONAL HIGH-PERFORMANCE AND NUMERICAL COMPUTING, FHPNC 2023 | 2023年
基金
瑞典研究理事会;
关键词
GPU; data-parallelism; functional languages; ALGORITHMS;
D O I
10.1145/3609024.3609411
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Optimal usage of the memory system is a key element of fast GPU algorithms. Unfortunately many common algorithms fail in this regard despite exhibiting great regularity in memory access patterns. In this paper we propose efficient kernels to permute the elements of an array. We handle a class of permutations known as Bit Matrix Multiply Complement (BMMC) permutations, for which we design kernels of speed comparable to that of a simple array copy. This is a first step towards implementing a set of array combinators based on these permutations.
引用
收藏
页码:15 / 28
页数:14
相关论文
共 50 条
  • [31] An efficient GPU implementation for a faster simulation of unsteady bed-load transport
    Juez, Carmelo
    Lacasta, Asier
    Murillo, Javier
    Garcia-Navarro, Pilar
    JOURNAL OF HYDRAULIC RESEARCH, 2016, 54 (03) : 275 - 288
  • [32] BFS-4K: An Efficient Implementation of BFS for Kepler GPU Architectures
    Busato, Federico
    Bombieri, Nicola
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2015, 26 (07) : 1826 - 1838
  • [33] A Memory-Access-Efficient Implementation of the Approximate String Matching Algorithm on GPU
    Nunes, Lucas S. N.
    Bordim, J. L.
    Nakano, K.
    Ito, Y.
    2016 FOURTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), 2016, : 483 - 489
  • [34] EFFICIENT IMPLEMENTATION OF INSAR TIME-CONSUMING ALGORITHM KERNELS ON GPU ENVIRONMENT
    Guerriero, Andrea
    Anelli, Vito Walter
    Pagliara, Alessandro
    Nutricato, Raffaele
    Nitti, Davide Oscar
    2015 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2015, : 4264 - 4267
  • [35] Implementation of Parallel Sparse Cholesky Factorization on GPU
    Zou, Dan
    Dou, Yong
    PROCEEDINGS OF 2012 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2012), 2012, : 2228 - 2232
  • [36] Learned Index on GPU
    Zhong, Xun
    Zhang, Yong
    Chen, Yu
    Li, Chao
    Xing, Chunxiao
    2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW 2022), 2022, : 117 - 122
  • [37] An efficient GPU implementation and scaling for higher-order 3D stencils
    Anjum, Omer
    Almasri, Mohammad
    de Gonzalo, Simon Garcia
    Hwu, Wen-mei
    INFORMATION SCIENCES, 2022, 586 : 326 - 343
  • [38] An efficient GPU implementation for large scale individual-based simulation of collective behavior
    Erra, Ugo
    Frola, Bernardino
    Scarano, Vittorio
    Couzin, Iain
    2009 INTERNATIONAL WORKSHOP ON HIGH PERFORMANCE COMPUTATIONAL SYSTEMS BIOLOGY, PROCEEDINGS, 2009, : 51 - +
  • [39] FAST AND EFFICIENT REAL-TIME GPU BASED IMPLEMENTATION OF WAVE FIELD SYNTHESIS
    Ranjan, Rishabh
    Gan, Woon-Seng
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [40] An Efficient GPU Implementation of a Multi-Start TSP Solver for Large Problem Instances
    Rocki, Kamil
    Suda, Reiji
    PROCEEDINGS OF THE FOURTEENTH INTERNATIONAL CONFERENCE ON GENETIC AND EVOLUTIONARY COMPUTATION COMPANION (GECCO'12), 2012, : 1441 - 1442