Adapting combined tiling to stencil optimizations on sunway processor

被引:3
|
作者
Sun, Biao [1 ]
Li, Mingzhen [1 ]
Yang, Hailong [1 ]
Xu, Jun [2 ]
Luan, Zhongzhi [1 ]
Qian, Depei [1 ]
机构
[1] Beihang Univ, Sch Comp Sci & Engn, Beijing 100191, Peoples R China
[2] Beijing Simulat Ctr, Sci & Technol Special Syst Simulat Lab, Beijing 100854, Peoples R China
基金
中国国家自然科学基金;
关键词
Stencil computation; Sunway processor; Performance optimization; Combined tiling; MODEL;
D O I
10.1007/s42514-023-00147-x
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Stencil is one of the indispensable computation patterns in scientific applications, which is a long-standing optimization target in the field of high performance computing (HPC). The Sunway processor adopted in Sunway TaihuLight supercomputer has demonstrated its performance potential with unique heterogeneous many-core architecture. Although a large number of optimization methods have been proposed, the memory-bound nature of stencil computation and the limited bandwidth of Sunway processor make it challenging to adapt stencil computation efficiently on Sunway processor. To better use the computation capability of Sunway processor, we propose a combined tiling optimization of stencil computation tailored for the architectural features. In addition, we implement double buffering, vectorization, and register communication to further accelerate stencil computation on Sunway processor. We evaluate our method on six stencil benchmarks with different orders and shapes (thus different memory access patterns and computation intensities). The experimental results show that our implementation can achieve 1.97x speedup on average compared to the state-of-the-art stencil implementation on Sunway.
引用
收藏
页码:322 / 333
页数:12
相关论文
共 28 条
  • [1] Adapting combined tiling to stencil optimizations on sunway processor
    Biao Sun
    Mingzhen Li
    Hailong Yang
    Jun Xu
    Zhongzhi Luan
    Depei Qian
    CCF Transactions on High Performance Computing, 2023, 5 : 322 - 333
  • [2] 26 PFLOPS Stencil Computations for Atmospheric Modeling on Sunway TaihuLight
    Ao, Yulong
    Yang, Chao
    Wang, Xinliang
    Xue, Wei
    Fu, Haohuan
    Liu, Fangfang
    Gan, Lin
    Xu, Ping
    Ma, Wenjing
    2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2017, : 535 - 544
  • [3] TOAST: Automatic tiling for iterative stencil computations on GPUs
    Rocha, Rodrigo C. O.
    Pereira, Alyson D.
    Ramos, Luiz
    Goes, Luis F. W.
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (08):
  • [4] Toward Accelerated Stencil Computation by Adapting Tensor Core Unit on GPU
    Liu, Xiaoyan
    Liu, Yi
    Yang, Hailong
    Liao, Jianjin
    Li, Mingzhen
    Luan, Zhongzhi
    Qian, Depei
    PROCEEDINGS OF THE 36TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ICS 2022, 2022,
  • [5] An Approach of Processor Core Customization for Stencil Computation
    Li, Yanhua
    Zhang, Youhui
    Yang, Jianfeng
    Luk, Wayne
    Yang, Guangwen
    Zheng, Weimin
    PROCEEDINGS OF THE 2014 IEEE 25TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 2014), 2014, : 182 - +
  • [6] Extreme-scale Realistic Stencil Computations on Sunway TaihuLight with Ten Million Cores
    Cai, Ying
    Yang, Chao
    Ma, Wenjing
    Ao, Yulong
    2018 18TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2018, : 566 - 571
  • [7] Towards optimized tensor code generation for deep learning on sunway many-core processor
    Li, Mingzhen
    Liu, Changxi
    Liao, Jianjin
    Zheng, Xuegui
    Yang, Hailong
    Sun, Rujun
    Xu, Jun
    Gan, Lin
    Yang, Guangwen
    Luan, Zhongzhi
    Qian, Depei
    FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (02)
  • [8] Towards optimized tensor code generation for deep learning on sunway many-core processor
    Mingzhen Li
    Changxi Liu
    Jianjin Liao
    Xuegui Zheng
    Hailong Yang
    Rujun Sun
    Jun Xu
    Lin Gan
    Guangwen Yang
    Zhongzhi Luan
    Depei Qian
    Frontiers of Computer Science, 2024, 18
  • [9] DHTS: A Dynamic Hybrid Tiling Strategy for Optimizing Stencil Computation on GPUs
    Liu, Song
    Zhang, Zengyuan
    Wu, Weiguo
    IEEE TRANSACTIONS ON COMPUTERS, 2023, 72 (10) : 2795 - 2807
  • [10] Evaluating optimizations that reduce global memory accesses of stencil computations in GPGPUs
    Nasciutti, Thiago Carrijo
    Panetta, Jairo
    Lopes, Pedro Pais
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2019, 31 (18):