Design and implementation of an efficient integer count sort in CUDA GPUs

被引:8
作者
Kolonias, Vasileios [1 ]
Voyiatzis, Artemios G. [2 ]
Goulas, George [1 ]
Housos, Efthymios [1 ]
机构
[1] Univ Patras, Comp Syst Lab, Dept Elect & Comp Engn, GR-26504 Patras, Greece
[2] Ind Syst Inst RC Athena, GR-26504 Patras, Greece
关键词
count sort; counting sort; CUDA; GPU;
D O I
10.1002/cpe.1776
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We describe experience on design and implementation of an efficient count sort algorithm on Compute Unified Device Architecture graphics processing units. The novelty of this work is twofold. At first, we propose a count sort algorithm for integers that needs no synchronization at its last step and thus, offers superior performance. At second, this work contributes ad hoc techniques for optimizing the performance of the algorithm on Compute Unified Device Architecture-enabled graphics processing units. Copyright (C) 2011 John Wiley & Sons, Ltd.
引用
收藏
页码:2365 / 2381
页数:17
相关论文
共 18 条
[1]  
[Anonymous], P 20 INT C PAR DISTR
[2]  
[Anonymous], 2020, CUDA C++ Programming Guide
[3]  
[Anonymous], 1990, CMUCS90190 SCH COMP
[4]  
[Anonymous], NVIDIA CUDA C BEST P
[5]  
[Anonymous], 2007, GPU gems
[6]  
[Anonymous], 2017, CUDA Data Parallel Primitives Library
[7]  
[Anonymous], DOBBS J
[8]  
Cederman D, 2008, LECT NOTES COMPUT SC, V5193, P246, DOI 10.1007/978-3-540-87744-8_21
[9]  
Chhugani J, 2008, PROC VLDB ENDOW, V1, P1313
[10]  
Govindaraju N.K., 2006, ACM SIGMOD INT C MAN