Efficient GPU Implementation of Affine Index Permutations on Arrays

被引:0
|
作者
Bouverot-Dupuis, Mathis [1 ]
Sheeran, Mary [2 ]
机构
[1] ENS Paris, Paris, France
[2] Chalmers Univ, Gothenburg, Sweden
来源
PROCEEDINGS OF THE 11TH ACM SIGPLAN INTERNATIONAL WORKSHOP ON FUNCTIONAL HIGH-PERFORMANCE AND NUMERICAL COMPUTING, FHPNC 2023 | 2023年
基金
瑞典研究理事会;
关键词
GPU; data-parallelism; functional languages; ALGORITHMS;
D O I
10.1145/3609024.3609411
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Optimal usage of the memory system is a key element of fast GPU algorithms. Unfortunately many common algorithms fail in this regard despite exhibiting great regularity in memory access patterns. In this paper we propose efficient kernels to permute the elements of an array. We handle a class of permutations known as Bit Matrix Multiply Complement (BMMC) permutations, for which we design kernels of speed comparable to that of a simple array copy. This is a first step towards implementing a set of array combinators based on these permutations.
引用
收藏
页码:15 / 28
页数:14
相关论文
共 50 条
  • [1] Efficient Implementation of MrBayes on Multi-GPU
    Bao, Jie
    Xia, Hongju
    Zhou, Jianfu
    Liu, Xiaoguang
    Wang, Gang
    MOLECULAR BIOLOGY AND EVOLUTION, 2013, 30 (06) : 1471 - 1479
  • [2] Energy Efficient Affine Register File for GPU Microarchitecture
    Wang, Shao-Chung
    Kan, Li-Chen
    Hwang, Yuan-Shin
    Lee, Jenq-Kuen
    PROCEEDINGS OF 45TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS (ICPPW 2016), 2016, : 52 - 58
  • [3] Efficient GPU implementation of randomized SVD and its applications
    Struski, Lukasz
    Morkisz, Pawel
    Spurek, Przemyslaw
    Bernabeu, Samuel Rodriguez
    Trzcinski, Tomasz
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 248
  • [4] Learning Sparse Matrix Row Permutations for Efficient SpMM on GPU Architectures
    Mehrabi, Atefeh
    Lee, Donghyuk
    Chatterjee, Niladrish
    Sorin, Daniel J.
    Lee, Benjamin C.
    O'Connor, Mike
    2021 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS 2021), 2021, : 48 - 58
  • [5] G-Learned Index: Enabling Efficient Learned Index on GPU
    Liu, Jiesong
    Zhang, Feng
    Lu, Lv
    Qi, Chang
    Guo, Xiaoguang
    Deng, Dong
    Li, Guoliang
    Zhang, Huanchen
    Zhai, Jidong
    Zhang, Hechen
    Chen, Yuxing
    Pan, Anqun
    Du, Xiaoyong
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (06) : 795 - 812
  • [6] CAVLCU: an efficient GPU-based implementation of CAVLC
    Fuentes-Alventosa, Antonio
    Gomez-Luna, Juan
    Maria Gonzalez-Linares, Jose
    Guil, Nicolas
    Medina-Carnicer, R.
    JOURNAL OF SUPERCOMPUTING, 2022, 78 (06) : 7556 - 7590
  • [7] EFFICIENT DICTIONARY LEARNING IMPLEMENTATION ON THE GPU USING OPENCL
    Irofti, Paul
    UNIVERSITY POLITEHNICA OF BUCHAREST SCIENTIFIC BULLETIN SERIES C-ELECTRICAL ENGINEERING AND COMPUTER SCIENCE, 2016, 78 (03): : 39 - 50
  • [8] Efficient Implementation of Apriori Algorithm on HDFS using GPU
    Tiwary, Mayank
    Sahoo, Abhaya Kumar
    Misra, Rachita
    2014 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND APPLICATIONS (ICHPCA), 2014,
  • [9] CAVLCU: an efficient GPU-based implementation of CAVLC
    Antonio Fuentes-Alventosa
    Juan Gómez-Luna
    José Maria González-Linares
    Nicolás Guil
    R. Medina-Carnicer
    The Journal of Supercomputing, 2022, 78 : 7556 - 7590
  • [10] EFFICIENT DESIGN AND IMPLEMENTATION OF VISUAL COMPUTING ALGORITHMS ON THE GPU
    Park, In Kyu
    Singhal, Nitin
    Lee, Man Hee
    Cho, Sungdae
    2009 16TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-6, 2009, : 2321 - +