Adapting combined tiling to stencil optimizations on sunway processor

被引:3
|
作者
Sun, Biao [1 ]
Li, Mingzhen [1 ]
Yang, Hailong [1 ]
Xu, Jun [2 ]
Luan, Zhongzhi [1 ]
Qian, Depei [1 ]
机构
[1] Beihang Univ, Sch Comp Sci & Engn, Beijing 100191, Peoples R China
[2] Beijing Simulat Ctr, Sci & Technol Special Syst Simulat Lab, Beijing 100854, Peoples R China
基金
中国国家自然科学基金;
关键词
Stencil computation; Sunway processor; Performance optimization; Combined tiling; MODEL;
D O I
10.1007/s42514-023-00147-x
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Stencil is one of the indispensable computation patterns in scientific applications, which is a long-standing optimization target in the field of high performance computing (HPC). The Sunway processor adopted in Sunway TaihuLight supercomputer has demonstrated its performance potential with unique heterogeneous many-core architecture. Although a large number of optimization methods have been proposed, the memory-bound nature of stencil computation and the limited bandwidth of Sunway processor make it challenging to adapt stencil computation efficiently on Sunway processor. To better use the computation capability of Sunway processor, we propose a combined tiling optimization of stencil computation tailored for the architectural features. In addition, we implement double buffering, vectorization, and register communication to further accelerate stencil computation on Sunway processor. We evaluate our method on six stencil benchmarks with different orders and shapes (thus different memory access patterns and computation intensities). The experimental results show that our implementation can achieve 1.97x speedup on average compared to the state-of-the-art stencil implementation on Sunway.
引用
收藏
页码:322 / 333
页数:12
相关论文
共 28 条
  • [21] Superblock-based performance optimization for Sunway Math Library on SW26010 many-core processor
    Cao, Hao
    Guo, Shaozhong
    Hao, Jiangwei
    Xia, Yuanyuan
    Xu, Jinchen
    JOURNAL OF SUPERCOMPUTING, 2022, 78 (04): : 4827 - 4849
  • [22] Superblock-based performance optimization for Sunway Math Library on SW26010 many-core processor
    Hao Cao
    Shaozhong Guo
    Jiangwei Hao
    Yuanyuan Xia
    Jinchen Xu
    The Journal of Supercomputing, 2022, 78 : 4827 - 4849
  • [23] Enabling Highly Efficient k-Means Computations on the SW26010 Many-Core Processor of Sunway TaihuLight
    Min Li
    Chao Yang
    Qiao Sun
    Wen-Jing Ma
    Wen-Long Cao
    Yu-Long Ao
    Journal of Computer Science and Technology, 2019, 34 : 77 - 93
  • [24] Enabling Highly Efficient k-Means Computations on the SW26010 Many-Core Processor of Sunway TaihuLight
    Li, Min
    Yang, Chao
    Sun, Qiao
    Ma, Wen-Jing
    Cao, Wen-Long
    Ao, Yu-Long
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2019, 34 (01) : 77 - 93
  • [25] Optimizations of Two Compute-bound Scientific Kernels on the SW26010 Many-core Processor
    Lin, James
    Xu, Zhigeng
    Nukada, Akira
    Maruyama, Naoya
    Matsuoka, Satoshi
    2017 46TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2017, : 432 - 441
  • [26] Evaluating the SW26010 many-core processor with a micro-benchmark suite for performance optimizations
    Lin, James
    Xu, Zhigeng
    Cai, Linjin
    Nukada, Akira
    Matsuoka, Satoshi
    PARALLEL COMPUTING, 2018, 77 : 128 - 143
  • [27] Heating load, COP and exergetic efficiency optimizations for TEG-TEH combined thermoelectric device with Thomson effect and external heat transfer
    Chen, Lingen
    Lorenzini, Giulio
    ENERGY, 2023, 270
  • [28] Support schemes adapting district energy combined heat and power for the role as a flexibility provider in renewable energy systems
    Andersen, Anders N.
    Ostergaard, Poul Alberg
    ENERGY, 2020, 192