Fast Scan Algorithms on Graphics Processors

被引:0
|
作者
Dotsenko, Yuri [1 ]
Govindaraju, Naga K. [1 ]
Sloan, Peter-Pike [1 ]
Boyd, Charles [1 ]
Manferdelli, John [1 ]
机构
[1] Microsoft Corp, Redmond, WA 98052 USA
来源
ICS'08: PROCEEDINGS OF THE 2008 ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING | 2008年
关键词
Scan; all-prefix-sum; segmented scan; GPGPU; GPU; parallel algorithm; HPC; many-core;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Scan and segmented scan are important data-parallel primitives for a wide range of applications. We present fast, work-efficient algorithms for these primitives on graphics processing units (GPUs). We use novel data representations that map well to the GPU architecture. Our algorithms exploit shared memory to improve memory performance. We further improve the performance of our algorithms by eliminating shared-memory bank conflicts and reducing the overheads in prior shared-memory GPU algorithms. Furthermore, our algorithms are designed to work well on general data sets, including segmented arrays with arbitrary segment lengths. We also present optimizations to improve the performance of segmented scans based on the segment lengths. We implemented our algorithms on a PC with an NVIDIA GeForce 8800 GPU and compared our results with prior GPU-based algorithms. Our results indicate up to IN higher performance over prior algorithms on input sequences with millions of elements.
引用
收藏
页码:205 / 213
页数:9
相关论文
共 50 条
  • [1] Fast multipole methods on graphics processors
    Gumerov, Nail A.
    Duraiswami, Ramani
    JOURNAL OF COMPUTATIONAL PHYSICS, 2008, 227 (18) : 8290 - 8313
  • [2] FAST ACOUSTIC COMPUTATIONS USING GRAPHICS PROCESSORS
    Dixon, Paul R.
    Oonishi, Tasuku
    Furui, Sadaoki
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4321 - 4324
  • [3] Fast Blocking of Householder Reflectors on Graphics Processors
    Tomas, Andres
    Quintana-Orti, Enrique S.
    2018 26TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2018), 2018, : 385 - 393
  • [4] Parallelization Techniques for Implementing Trellis Algorithms on Graphics Processors
    Zheng, Q.
    Chen, Y.
    Dreslinski, R.
    Chakrabarti, C.
    Anastasopoulos, A.
    Mahlke, S.
    Mudge, T.
    2013 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2013, : 1220 - 1223
  • [5] Implementation and performance evaluation of reconstruction algorithms on graphics processors
    Diez, Daniel Castano
    Mueller, Hannes
    Frangakis, Achilleas S.
    JOURNAL OF STRUCTURAL BIOLOGY, 2007, 157 (01) : 288 - 295
  • [6] Efficient Mapping of Multiresolution Image Filtering Algorithms on Graphics Processors
    Membarth, Richard
    Hannig, Frank
    Dutta, Hritam
    Teich, Juergen
    EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION, PROCEEDINGS, 2009, 5657 : 277 - 288
  • [7] Iterative regularization algorithms for constrained image deblurring on graphics processors
    Ruggiero, Valeria
    Serafini, Thomas
    Zanella, Riccardo
    Zanni, Luca
    JOURNAL OF GLOBAL OPTIMIZATION, 2010, 48 (01) : 145 - 157
  • [8] Iterative regularization algorithms for constrained image deblurring on graphics processors
    Valeria Ruggiero
    Thomas Serafini
    Riccardo Zanella
    Luca Zanni
    Journal of Global Optimization, 2010, 48 : 145 - 157
  • [9] Boosting Performance of Map Matching Algorithms by Parallelization on Graphics Processors
    Auer, Markus
    Rehborn, Hubert
    Molzahn, Sven-Eric
    Bogenberger, Klaus
    2017 28TH IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV 2017), 2017, : 462 - 467
  • [10] Auto-tuning of Fast Fourier Transform on Graphics Processors
    Dotsenkot, Yuri
    Baghsorkhi, Sara S.
    Lloyd, Brandon
    Govindaraju, Naga K.
    ACM SIGPLAN NOTICES, 2011, 46 (08) : 257 - 266