Adapting combined tiling to stencil optimizations on sunway processor

被引:3
|
作者
Sun, Biao [1 ]
Li, Mingzhen [1 ]
Yang, Hailong [1 ]
Xu, Jun [2 ]
Luan, Zhongzhi [1 ]
Qian, Depei [1 ]
机构
[1] Beihang Univ, Sch Comp Sci & Engn, Beijing 100191, Peoples R China
[2] Beijing Simulat Ctr, Sci & Technol Special Syst Simulat Lab, Beijing 100854, Peoples R China
基金
中国国家自然科学基金;
关键词
Stencil computation; Sunway processor; Performance optimization; Combined tiling; MODEL;
D O I
10.1007/s42514-023-00147-x
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Stencil is one of the indispensable computation patterns in scientific applications, which is a long-standing optimization target in the field of high performance computing (HPC). The Sunway processor adopted in Sunway TaihuLight supercomputer has demonstrated its performance potential with unique heterogeneous many-core architecture. Although a large number of optimization methods have been proposed, the memory-bound nature of stencil computation and the limited bandwidth of Sunway processor make it challenging to adapt stencil computation efficiently on Sunway processor. To better use the computation capability of Sunway processor, we propose a combined tiling optimization of stencil computation tailored for the architectural features. In addition, we implement double buffering, vectorization, and register communication to further accelerate stencil computation on Sunway processor. We evaluate our method on six stencil benchmarks with different orders and shapes (thus different memory access patterns and computation intensities). The experimental results show that our implementation can achieve 1.97x speedup on average compared to the state-of-the-art stencil implementation on Sunway.
引用
收藏
页码:322 / 333
页数:12
相关论文
共 28 条
  • [11] Hexagonal Tiling based Multiple FPGAs Stencil Computation Acceleration and Optimization Methodology
    Wang, Jinyu
    Kang, Yifei
    Li, Yiwen
    Wu, Weiguo
    Liu, Song
    Wang, Longxiang
    19TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2021), 2021, : 697 - 705
  • [12] Towards efficient canonical polyadic decomposition on sunway many-core processor
    Dun, Ming
    Li, Yunchun
    Sun, Qingxiao
    Yang, Hailong
    Li, Wei
    Luan, Zhongzhi
    Gan, Lin
    Yang, Guangwen
    Qian, Depei
    INFORMATION SCIENCES, 2021, 549 : 221 - 248
  • [13] swGBDT: Efficient Gradient Boosted Decision Tree on Sunway Many-Core Processor
    Yin, Bohong
    Li, Yunchun
    Dun, Ming
    You, Xin
    Yang, Hailong
    Luan, Zhongzhi
    Qian, Depei
    SUPERCOMPUTING FRONTIERS (SCFA 2020), 2020, 12082 : 67 - 86
  • [14] Pipelining Computation and Optimization Strategies for Scaling GROMACS on the Sunway Many-Core Processor
    Yu, Yang
    An, Hong
    Chen, Junshi
    Liang, Weihao
    Xu, Qingqing
    Chen, Yong
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2017, 2017, 10393 : 18 - 32
  • [15] PADS: A Pattern-Driven Stencil Compiler-Based Tool for Reuse of Optimizations on GPGPUs
    Han, Dongni
    Xu, Shixiong
    Chen, Li
    Huang, Lei
    2011 IEEE 17TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2011, : 308 - 315
  • [16] Practical applicability of optimizations and performance models to complex stencil-based loop kernels in CFD
    Wichmann, Karl-Robert
    Kronbichler, Martin
    Loehner, Rainald
    Wall, Wolfgang A.
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2019, 33 (04): : 602 - 618
  • [17] Correlation of Performance Optimizations and Energy Consumption for Stencil-Based Application on Intel Xeon Scalable Processors
    Szustak, Lukasz
    Wyrzykowski, Roman
    Olas, Tomasz
    Mele, Valeria
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (11) : 2582 - 2593
  • [18] Mixed Precision Based Parallel Optimization of Tensor Mathematical Operations on a New-generation Sunway Processor
    Fan, Shuwei
    Liu, Yao
    Su, Juliang
    Wu, Xianyou
    Jiang, Qiong
    2023 IEEE/ACM 23RD INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING, CCGRID, 2023, : 605 - 614
  • [19] Performance Tuning and Analysis for Stencil-Based Applications on POWER8 Processor
    Xu, Jingheng
    Fu, Haohuan
    Shi, Wen
    Gan, Lin
    Li, Yuxuan
    Luk, Wayne
    Yang, Guangwen
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2019, 15 (04)
  • [20] A Hierarchical Grid Algorithm for Accelerating High-Performance Conjugate Gradient Benchmark on Sunway Many-core Processor
    Liao, Chenzhi
    Chen, Junshi
    Han, Wenting
    Cao, Huanqi
    Su, Zhichao
    Yin, Wanwang
    An, Hong
    PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON COMMUNICATION AND INFORMATION PROCESSING (ICCIP 2017), 2017, : 361 - 368