Efficient GPU Implementation of Affine Index Permutations on Arrays

被引:0
|
作者
Bouverot-Dupuis, Mathis [1 ]
Sheeran, Mary [2 ]
机构
[1] ENS Paris, Paris, France
[2] Chalmers Univ, Gothenburg, Sweden
来源
PROCEEDINGS OF THE 11TH ACM SIGPLAN INTERNATIONAL WORKSHOP ON FUNCTIONAL HIGH-PERFORMANCE AND NUMERICAL COMPUTING, FHPNC 2023 | 2023年
基金
瑞典研究理事会;
关键词
GPU; data-parallelism; functional languages; ALGORITHMS;
D O I
10.1145/3609024.3609411
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Optimal usage of the memory system is a key element of fast GPU algorithms. Unfortunately many common algorithms fail in this regard despite exhibiting great regularity in memory access patterns. In this paper we propose efficient kernels to permute the elements of an array. We handle a class of permutations known as Bit Matrix Multiply Complement (BMMC) permutations, for which we design kernels of speed comparable to that of a simple array copy. This is a first step towards implementing a set of array combinators based on these permutations.
引用
收藏
页码:15 / 28
页数:14
相关论文
共 50 条
  • [11] Efficient Parallel Implementation of Morphological Operation on GPU and FPGA
    Li, Teng
    Dou, Yong
    Jiang, Jingfei
    Gao, Jing
    2014 INTERNATIONAL CONFERENCE ON SECURITY, PATTERN ANALYSIS, AND CYBERNETICS (SPAC), 2014, : 430 - 435
  • [12] Efficient implementation of the many-body Reactive Bond Order (REBO) potential on GPU
    Tredak, Przemyslaw
    Rudnicki, Witold R.
    Majewski, Jacek A.
    JOURNAL OF COMPUTATIONAL PHYSICS, 2016, 321 : 556 - 570
  • [13] An efficient GPU implementation of fixed-complexity sphere decoders for MIMO wireless systems
    Roger, Sandra
    Ramiro, Carla
    Gonzalez, Alberto
    Almenar, Vicenc
    Vidal, Antonio M.
    INTEGRATED COMPUTER-AIDED ENGINEERING, 2012, 19 (04) : 341 - 350
  • [14] Efficient GPU implementation of the multivariate empirical mode decomposition algorithm
    Wang, Zeyu
    Juhasz, Zoltan
    JOURNAL OF COMPUTATIONAL SCIENCE, 2023, 74
  • [15] Efficient number theoretic transform implementation on GPU for homomorphic encryption
    Özgün Özerk
    Can Elgezen
    Ahmet Can Mert
    Erdinç Öztürk
    Erkay Savaş
    The Journal of Supercomputing, 2022, 78 : 2840 - 2872
  • [16] Efficient number theoretic transform implementation on GPU for homomorphic encryption
    Ozerk, Ozgun
    Elgezen, Can
    Mert, Ahmet Can
    Ozturk, Erdinc
    Savas, Erkay
    JOURNAL OF SUPERCOMPUTING, 2022, 78 (02) : 2840 - 2872
  • [17] Efficient GPU Implementation of Lucas-Kanade through OpenACC
    Haggui, Olfa
    Tadonki, Claude
    Sayadi, Fatma
    Ouni, Bouraoui
    PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5, 2019, : 768 - 775
  • [18] A Efficient Parallel Deblocking Filter Based on GPU: Implementation and Optimization
    Su, Huayou
    Zhang, Chunyuan
    Chai, Jun
    Yang, Qianming
    2011 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING (PACRIM), 2011, : 280 - 285
  • [19] An Efficient Implementation of Ant Colony Optimization on GPU for the Satisfiability Problem
    Youness, Hassan
    Ibraheim, Aziza
    Moness, Mohammed
    Osama, Muhammad
    23RD EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2015), 2015, : 230 - 235
  • [20] An Efficient Implementation of Fuzzy Edge Detection using GPU in MATLAB
    Hoseini, Farnaz
    Shahbahrami, Asadollah
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS 2015), 2015, : 605 - 610