Efficient GPU Implementation of Affine Index Permutations on Arrays

被引：0

作者：

Bouverot-Dupuis, Mathis ^{[1
]}

Sheeran, Mary ^{[2
]}

机构：

[1] ENS Paris, Paris, France

[2] Chalmers Univ, Gothenburg, Sweden

来源：

PROCEEDINGS OF THE 11TH ACM SIGPLAN INTERNATIONAL WORKSHOP ON FUNCTIONAL HIGH-PERFORMANCE AND NUMERICAL COMPUTING, FHPNC 2023 | 2023年

基金：

瑞典研究理事会;

关键词：

GPU; data-parallelism; functional languages; ALGORITHMS;

D O I：

10.1145/3609024.3609411

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Optimal usage of the memory system is a key element of fast GPU algorithms. Unfortunately many common algorithms fail in this regard despite exhibiting great regularity in memory access patterns. In this paper we propose efficient kernels to permute the elements of an array. We handle a class of permutations known as Bit Matrix Multiply Complement (BMMC) permutations, for which we design kernels of speed comparable to that of a simple array copy. This is a first step towards implementing a set of array combinators based on these permutations.

引用

页码：15 / 28

页数：14

共 50 条

[21] A Power Efficient Neural Network Implementation on Heterogeneous FPGA and GPU Devices
Tu, Yuexuan
Sadiq, Saad
Tao, Yudong
Shyu, Mei-Ling
Chen, Shu-Ching
2019 IEEE 20TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE (IRI 2019), 2019, : 193 - 199
[22] An Efficient Implementation of the Bellman-Ford Algorithm for Kepler GPU Architectures
Busato, Federico
Bombieri, Nicola
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2016, 27 (08) : 2222 - 2233
[23] Efficient GPU Implementation of Genetic Algorithm to Solve the Traveling Salesman Problem
Kidwell, Adam
Fillmore, Alex
Alawneh, Shadi
SOUTHEASTCON 2024, 2024, : 44 - 49
[24] Efficient implementation of Sobel edge detection algorithm on CPU, GPU and FPGA
Chouchene, M. (ch.marwa.84@gmail.com), 1600, Inderscience Enterprises Ltd. (05): : 105 - 117
[25] An Efficient GPU Implementation of Ant Colony Optimization for the Traveling Salesman Problem
Uchida, Akihiro
Ito, Yasuaki
Nakano, Koji
2012 THIRD INTERNATIONAL CONFERENCE ON NETWORKING AND COMPUTING (ICNC 2012), 2012, : 94 - 102
[26] An Efficient Transaction-based GPU Implementation of Minimum Spanning Forest Algorithm
Manoochehri, Shayan
Goodarzi, Bahareh
Goswami, Dhrubajyoti
2017 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2017, : 643 - 650
[27] Efficient GPU Implementation of Multiple-Precision Addition based on Residue Arithmetic
Isupov, Konstantin
Knyazkov, Vladimir
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (09) : 1 - 8
[28] Efficient Implementation of Quantum Materials Simulations on Distributed CPU-GPU Systems
Solca, Raffaele
Kozhevnikov, Anton
Haidar, Azzam
Tomov, Stanimire
Dongarra, Jack
Schulthess, Thomas C.
PROCEEDINGS OF SC15: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2015,
[29] Implementation of Parallel Sparse Cholesky Factorization on GPU
Zou, Dan
Dou, Yong
PROCEEDINGS OF 2012 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2012), 2012, : 2228 - 2232
[30] Efficient Implementation of BCH Decoders on GPU for Flash Memory Devices using iBMA
Subbiah, Arul K.
Ogunfunmi, Tokunbo
2016 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2016,

← 1 2 3 4 5 →