Efficient GPU Implementation of Affine Index Permutations on Arrays

被引：0

作者：

Bouverot-Dupuis, Mathis ^{[1
]}

Sheeran, Mary ^{[2
]}

机构：

[1] ENS Paris, Paris, France

[2] Chalmers Univ, Gothenburg, Sweden

来源：

PROCEEDINGS OF THE 11TH ACM SIGPLAN INTERNATIONAL WORKSHOP ON FUNCTIONAL HIGH-PERFORMANCE AND NUMERICAL COMPUTING, FHPNC 2023 | 2023年

基金：

瑞典研究理事会;

关键词：

GPU; data-parallelism; functional languages; ALGORITHMS;

D O I：

10.1145/3609024.3609411

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Optimal usage of the memory system is a key element of fast GPU algorithms. Unfortunately many common algorithms fail in this regard despite exhibiting great regularity in memory access patterns. In this paper we propose efficient kernels to permute the elements of an array. We handle a class of permutations known as Bit Matrix Multiply Complement (BMMC) permutations, for which we design kernels of speed comparable to that of a simple array copy. This is a first step towards implementing a set of array combinators based on these permutations.

引用

页码：15 / 28

页数：14

共 50 条

[31] An efficient GPU implementation for a faster simulation of unsteady bed-load transport
Juez, Carmelo
Lacasta, Asier
Murillo, Javier
Garcia-Navarro, Pilar
JOURNAL OF HYDRAULIC RESEARCH, 2016, 54 (03) : 275 - 288
[32] BFS-4K: An Efficient Implementation of BFS for Kepler GPU Architectures
Busato, Federico
Bombieri, Nicola
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2015, 26 (07) : 1826 - 1838
[33] A Memory-Access-Efficient Implementation of the Approximate String Matching Algorithm on GPU
Nunes, Lucas S. N.
Bordim, J. L.
Nakano, K.
Ito, Y.
2016 FOURTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), 2016, : 483 - 489
[34] EFFICIENT IMPLEMENTATION OF INSAR TIME-CONSUMING ALGORITHM KERNELS ON GPU ENVIRONMENT
Guerriero, Andrea
Anelli, Vito Walter
Pagliara, Alessandro
Nutricato, Raffaele
Nitti, Davide Oscar
2015 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2015, : 4264 - 4267
[35] Implementation of Parallel Sparse Cholesky Factorization on GPU
Zou, Dan
Dou, Yong
PROCEEDINGS OF 2012 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2012), 2012, : 2228 - 2232
[36] Learned Index on GPU
Zhong, Xun
Zhang, Yong
Chen, Yu
Li, Chao
Xing, Chunxiao
2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW 2022), 2022, : 117 - 122
[37] An efficient GPU implementation and scaling for higher-order 3D stencils
Anjum, Omer
Almasri, Mohammad
de Gonzalo, Simon Garcia
Hwu, Wen-mei
INFORMATION SCIENCES, 2022, 586 : 326 - 343
[38] An efficient GPU implementation for large scale individual-based simulation of collective behavior
Erra, Ugo
Frola, Bernardino
Scarano, Vittorio
Couzin, Iain
2009 INTERNATIONAL WORKSHOP ON HIGH PERFORMANCE COMPUTATIONAL SYSTEMS BIOLOGY, PROCEEDINGS, 2009, : 51 - +
[39] FAST AND EFFICIENT REAL-TIME GPU BASED IMPLEMENTATION OF WAVE FIELD SYNTHESIS
Ranjan, Rishabh
Gan, Woon-Seng
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[40] An Efficient GPU Implementation of a Multi-Start TSP Solver for Large Problem Instances
Rocki, Kamil
Suda, Reiji
PROCEEDINGS OF THE FOURTEENTH INTERNATIONAL CONFERENCE ON GENETIC AND EVOLUTIONARY COMPUTATION COMPANION (GECCO'12), 2012, : 1441 - 1442

← 1 2 3 4 5 →