BSGP: Bulk-synchronous GPU programming

被引:77
|
作者
Hou, Qiming [1 ]
Zhou, Kun
Guo, Baining [1 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
来源
ACM TRANSACTIONS ON GRAPHICS | 2008年 / 27卷 / 03期
关键词
programable graphics hardware; stream processing; bulk synchronous parallel programming; thread manipulation;
D O I
10.1145/1360612.1360618
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We present BSGP. a new programming language for general purpose computation on the GPU. A BSGP program looks much the same as a sequential C program. Programmers only need to supply a bare Minimum of extra information to describe parallel processing on GPUs. As a result, BSGP programs are easy to read, write. and maintain. Moreover, the ease of programming does not come at the cost of performance. A well-designed BSGP compiler converts BSGP programs to kernels and combines them using optimally allocated temporary streams. In our benchmark, BSGP programs achieve similar or better performance than well-optimized CUDA programs. while the source code complexity and programming time are significantly reduced. To test BSGP's code efficiency and ease of programming, we implemented a variety of GPU applications, including a highly sophisticated X3D parser that would be extremely difficult to develop with existing GPU programming languages.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Towards a bulk-synchronous distributed shared memory programming environment for grids
    Mattsson, Hakan
    Kessler, Christoph
    APPLIED PARALLEL COMPUTING: STATE OF THE ART IN SCIENTIFIC COMPUTING, 2006, 3732 : 519 - 526
  • [2] Managing distributed shared arrays in a bulk-synchronous parallel programming environment
    Kessler, CW
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2004, 16 (2-3): : 133 - 153
  • [3] DIRECT BULK-SYNCHRONOUS PARALLEL ALGORITHMS
    GERBESSIOTIS, AV
    VALIANT, LG
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1994, 22 (02) : 251 - 267
  • [4] DIRECT BULK-SYNCHRONOUS PARALLEL ALGORITHMS
    GERBESSIOTIS, AV
    VALIANT, LG
    LECTURE NOTES IN COMPUTER SCIENCE, 1992, 621 : 1 - 18
  • [5] A bulk-synchronous parallel process algebra
    Merlin, Armelle
    Hains, Gaetan
    COMPUTER LANGUAGES SYSTEMS & STRUCTURES, 2007, 33 (3-4) : 111 - 133
  • [6] Bulk-synchronous parallel gaussian elimination
    Tiskin A.
    Journal of Mathematical Sciences, 2002, 108 (6) : 977 - 991
  • [7] Locality-Centric Thread Scheduling for Bulk-synchronous Programming Models on CPU Architectures
    Kim, Hee-Seok
    El Hajj, Izzat
    Stratton, John
    Lumetta, Steven
    Hwu, Wen-Mei
    2015 IEEE/ACM INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO), 2015, : 257 - 268
  • [8] Bulk-synchronous parallel multiplication of Boolean matrices
    Tiskin, A
    AUTOMATA, LANGUAGES AND PROGRAMMING, 1998, 1443 : 494 - 506
  • [9] A Hierarchical, bulk-synchronous stochastic gradient descent algorithm for deep-learning applications on GPU clusters
    Cong, Guojing
    Bhardwaj, Onkar
    2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2017, : 818 - 821
  • [10] EXTERNAL MEMORY IN BULK-SYNCHRONOUS PARALLEL ML
    Gava, Frederic
    SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2005, 6 (04): : 43 - 69