Optimal Parallel Algorithms for Computing the Sum, the Prefix-Sums, and the Summed Area Table on the Memory Machine Models

被引:16
|
作者
Nakano, Koji [1 ]
机构
[1] Hiroshima Univ, Dept Informat Engn, Higashihiroshima 7398527, Japan
关键词
memory machine models; prefix-sums computation; parallel algorithm; GPU; CUDA;
D O I
10.1587/transinf.E96.D.2626
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The main contribution of this paper is to show optimal parallel algorithms to compute the sum, the prefix-sums, and the summed area table on two memory machine models, the Discrete Memory Machine (DMM) and the Unified Memory Machine (UMM). The DMM and the UMM are theoretical parallel computing models that capture the essence of the shared memory and the global memory of GPUs. These models have three parameters, the number p of threads, and the width w of the memory, and the memory access latency l. We first show that the sum of n numbers can be computed in O(n/w + nl/p + l log n) time units on the DMM and the UMM. We then go on to show that Omega(n/w + nl/p + l log n) time units are necessary to compute the sum. We also present a parallel algorithm that computes the prefix-sums of n numbers in O(n/w + nl/p + l log n) time units on the DMM and the UMM. Finally, we show that the summed area table of size root n x root n can be computed in O(n/w + nl/p + l log n) time units on the DMM and the UMM. Since the computation of the prefix-sums and the summed area table is at least as hard as the sum computation, these parallel algorithms are also optimal.
引用
收藏
页码:2626 / 2634
页数:9
相关论文
共 4 条
  • [1] Effectiveness of GPU Realizations of Parallel Prefix-Sums Computation Algorithms
    Stokfiszewski, Kamil
    Puchala, Dariusz
    Yatsymirskyy, Mykhaylo
    2018 IEEE 13TH INTERNATIONAL SCIENTIFIC AND TECHNICAL CONFERENCE ON COMPUTER SCIENCES AND INFORMATION TECHNOLOGIES (CSIT), VOL 1, 2018, : 436 - 439
  • [2] Parallel Algorithms for the Summed Area Table on the Asynchronous Hierarchical Memory Machine, with GPU implementations
    Kasagi, Akihiko
    Nakano, Koji
    Ito, Yasuaki
    2014 43RD INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2014, : 251 - 260
  • [3] An Optimal Parallel Algorithm for Computing the Summed Area Table on the GPU
    Emoto, Yutaro
    Funasaka, Shunji
    Tokura, Hiroki
    Honda, Takumi
    Nakano, Koji
    Ito, Yasuaki
    2018 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2018), 2018, : 763 - 772
  • [4] Formal Verification of Parallel Stream Compaction and Summed-Area Table Algorithms
    Safari, Mohsen
    Huisman, Marieke
    THEORETICAL ASPECTS OF COMPUTING, ICTAC 2020, 2020, 12545 : 181 - 199