Compiling Generalized Histograms for GPU

被引:11
作者
Henriksen, Troels [1 ]
Hellfritzsch, Sune [1 ]
Sadayappan, Ponnuswamy [2 ]
Oancea, Cosmin [1 ]
机构
[1] Univ Copenhagen, Copenhagen, Denmark
[2] Univ Utah, Salt Lake City, UT USA
来源
PROCEEDINGS OF SC20: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC20) | 2020年
关键词
GPU; parallelism; functional programming;
D O I
10.1109/SC41405.2020.00101
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present and evaluate an implementation technique for histogram-like computations on GPUs hi that ensures both work-efficient asymptotic cost, support for arbitrary associative and commutative operators, and efficient use of hardware-supported atomic operations when applicable. Based on a systematic empirical examination of the design space, we develop a technique that balances conflict rates and memory footprint. We demonstrate our technique both as a library implementation in CUDA, as well as by extending the parallel array language Futhark with a new construct for expressing generalized histograms, and by supporting this construct with several compiler optimizations. We show that our histogram implementation taken in isolation outperforms similar primitives from CUB, and that it is competitive or outperforms the hand-written code of several application benchmarks, even when the latter is specialized for a class of datasets.
引用
收藏
页数:14
相关论文
共 29 条
[1]  
[Anonymous], 2012, CTR RELIABLE HIGH PE
[2]  
[Anonymous], 2017, GRADIENT BOOSTING DE
[3]   PENCIL: A Platform-Neutral Compute Intermediate Language for Accelerator Programming [J].
Baghdadi, Riyadh ;
Beaugnon, Ulysse ;
Cohen, Albert ;
Grosser, Tobias ;
Kruse, Michael ;
Reddy, Chandan ;
Verdoolaege, Sven ;
Absar, Javed ;
van Haastregt, Sven ;
Kravets, Alexey ;
Lokhmotov, Anton ;
Betts, Adam ;
Donaldson, Alastair F. ;
Ketema, Jeroen ;
David, Robert ;
Hajiyev, Elnar .
2015 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURE AND COMPILATION (PACT), 2015, :138-149
[4]  
Baydin AG, 2018, J MACH LEARN RES, V18
[5]  
Brown S, 2012, P INN PAR COMP FDN A, P1
[6]  
Catmull Edwin Earl, 1974, THESIS
[7]   Locally Orderless Registration [J].
Darkner, Sune ;
Sporring, Jon .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (06) :1437-1450
[8]  
Dhanasekaran B., 2011, GPGPU 4, DOI [10.1145/1964179.1964182, DOI 10.1145/1964179.1964182]
[9]   Principal Component Analysis for Categorical Histogram Data: Some Open Directions of Research [J].
Diday, Edwin .
CLASSIFICATION AND MULTIVARIATE ANALYSIS FOR COMPLEX DATA STRUCTURES, 2011, :3-15
[10]  
Ding YF, 2015, PR MACH LEARN RES, V37, P579