Automatic Code Generation and Optimization of Large-scale Stencil Computation on Many-core Processors

被引:10
作者
Li, Mingzhen [1 ,2 ]
Liu, Yi [2 ]
Yang, Hailong [1 ,2 ]
Hu, Yongmin [2 ]
Sun, Qingxiao [2 ]
Chen, Bangduo [2 ]
You, Xin [2 ]
Liu, Xiaoyan [2 ]
Luan, Zhongzhi [2 ]
Qian, Depei [2 ]
机构
[1] State Key Lab Software Dev Environm, Beijing, Peoples R China
[2] Beihang Univ, Beijing, Peoples R China
来源
50TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING | 2021年
基金
中国国家自然科学基金;
关键词
Stencil; Domain Specific Language; Performance Optimization; Manycore Architecture;
D O I
10.1145/3472456.3473517
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Stencil computation is an indispensable building block of many scientific applications and is widely used by the numerical solvers of partial differential equations (PDEs). Due to the complex computation patterns of different stencils and the various hardware targets (e.g., many-core processors), many domain-specific languages (DSLs) have been proposed to optimize stencil computation. However, existing stencil DSLs mostly focus on the performance optimizations on homogeneous many-core processors such as CPUs and GPUs, and fail to embrace emerging heterogeneous many-core processors such as Sunway. In addition, few of them can support expressing stencil with multiple time dependencies and optimizations from both spatial and temporal dimensions. Moreover, most stencil DSLs are unable to generate codes that can run efficiently in large scale, which limits their practical applicability. In this paper, we propose MSC, a new stencil DSL designed to express stencil computation in both spatial and temporal dimensions. It can generate high-performance stencil codes for large-scale execution on emerging many-core processors. Specially, we design several optimization primitives for improving parallelism and data locality, and a communication library for efficient halo exchange in large scale execution. The experiment results show that our MSC achieves better performance compared to the state-of-the-art stencil DSLs.
引用
收藏
页数:12
相关论文
共 41 条
  • [1] [Anonymous], 2010, SC10: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, DOI [10.1109/SC.2010.2, DOI 10.1109/SC.2010.2]
  • [2] [Anonymous], 2000, ser. SC '00, DOI 10.1109/SC.2000.10015
  • [3] [Anonymous], 2014, P ANN IEEEACM INT S
  • [4] OpenTuner: An Extensible Framework for Program Autotuning
    Ansel, Jason
    Kamil, Shoaib
    Veeramachaneni, Kalyan
    Ragan-Kelley, Jonathan
    Bosboom, Jeffrey
    O'Reilly, Una-May
    Amarasinghe, Saman
    [J]. PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT'14), 2014, : 303 - 315
  • [5] 26 PFLOPS Stencil Computations for Atmospheric Modeling on Sunway TaihuLight
    Ao, Yulong
    Yang, Chao
    Wang, Xinliang
    Xue, Wei
    Fu, Haohuan
    Liu, Fangfang
    Gan, Lin
    Xu, Ping
    Ma, Wenjing
    [J]. 2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2017, : 535 - 544
  • [6] Baghdadi R, 2019, INT SYM CODE GENER, P193, DOI [10.5281/zenodo.2375075, 10.1109/CGO.2019.8661197]
  • [7] Parameterized Diamond Tiling for Stencil Computations with Chapel parallel iterators
    Bertolacci, Ian J.
    Olschanowsky, Catherine
    Harshbarger, Ben
    Chamberlain, Bradford L.
    Wonnacott, David G.
    Strout, Michelle Mills
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS'15), 2015, : 197 - 206
  • [8] A Practical Automatic Polyhedral Parallelizer and Locality Optimizer
    Bondhugula, Uday
    Hartono, Albert
    Ramanujam, J.
    Sadayappan, R.
    [J]. PLDI'08: PROCEEDINGS OF THE 2008 SIGPLAN CONFERENCE ON PROGRAMMING LANGUAGE DESIGN & IMPLEMENTATION, 2008, : 101 - +
  • [9] Cheshmi Kazem, 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. Proceedings, P779, DOI 10.1109/SC.2018.00065
  • [10] Christen M., 2011, Proceedings of the 25th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2011), P676, DOI 10.1109/IPDPS.2011.70