StencilMART: Predicting Optimization Selection for Stencil Computations across GPUs

被引:4
作者
Sun, Qingxiao [1 ,2 ]
Liu, Yi [2 ]
Yang, Hailong [1 ,2 ]
Jiang, Zhonghui [2 ]
Luan, Zhongzhi [2 ]
Qian, Depei [2 ]
机构
[1] State Key Lab Software Dev Environm, Beijing 100191, Peoples R China
[2] Beihang Univ, Sch Comp Sci & Engn, Beijing 100191, Peoples R China
来源
2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2022) | 2022年
基金
中国国家自然科学基金;
关键词
Stencil Computation; GPU; Optimization Strategies; Performance Prediction; Machine Learning;
D O I
10.1109/IPDPS53621.2022.00090
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Stencil computations are widely used in high performance computing (HPC) applications. Many HPC platforms utilize the high computation capability of GPUs to accelerate stencil computations. In recent years, stencils have become more diverse in terms of stencil order, memory accesses and computation patterns. To adapt diverse stencils to GPUs, a variety of optimization techniques have been proposed such as streaming and retiming. However, due to the diversity of stencil patterns and GPU architectures, no single optimization technique fits all stencils. Besides, it is challenging to choose the most cost-efficient GPU for accelerating target stencils. To address the above problems, we propose StencilMART, an automatic optimization selection framework that predicts the best optimization combination and execution time under a certain parameter setting for stencils on GPUs. Specifically, the StencilMART represents the stencil patterns as binary tensors and neighboring features through tensor assignment and feature extraction. In addition, the StencilMART implements various machine learning methods such as classification and regression that utilize stencil representation and hardware characteristics for execution time prediction. The experiment results show that the StencilMART can achieve accurate optimization selection and performance prediction for various stencils across GPUs.
引用
收藏
页码:875 / 885
页数:11
相关论文
共 29 条
  • [1] Abadi M., 2016, TENSORFLOW LARGE SCA, P265, DOI 10.5555/3026877.3026899
  • [2] PPT-GPU: Scalable GPU Performance Modeling
    Arafa, Yehia
    Badawy, Abdel-Hameed A.
    Chennupati, Gopinath
    Santhi, Nandakishore
    Eidenbenz, Stephan
    [J]. IEEE COMPUTER ARCHITECTURE LETTERS, 2019, 18 (01) : 55 - 58
  • [3] Cross-Architecture Performance Prediction (XAPP) Using CPU Code to Predict GPU Performance
    Ardalani, Newsha
    Lestourgeon, Clint
    Sankaralingam, Karthikeyan
    Zhu, Xiaojin
    [J]. PROCEEDINGS OF THE 48TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO-48), 2015, : 725 - 737
  • [4] Benesty J., 2009, Noise reduction in speech processing, V2, P1
  • [5] XGBoost: A Scalable Tree Boosting System
    Chen, Tianqi
    Guestrin, Carlos
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 785 - 794
  • [6] Autotuning Stencil Computations with Structural Ordinal Regression Learning
    Cosenza, Biagio
    Durillo, Juan J.
    Ermon, Stefano
    Juurlink, Ben
    [J]. 2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2017, : 287 - 296
  • [7] Local Recovery and Failure Masking for Stencil-based Applications at Extreme Scales
    Gamell, Marc
    Teranishi, Keita
    Heroux, Michael A.
    Mayo, Jackson
    Kolla, Hemanth
    Chen, Jacqueline
    Parashar, Manish
    [J]. PROCEEDINGS OF SC15: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2015,
  • [8] Automatic Performance Tuning of Stencil Computations on GPUs
    Garvey, Joseph D.
    Abdelrahman, Tarek S.
    [J]. 2015 44TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2015, : 300 - 309
  • [9] Grosser T., 2014, CGO 14, P66
  • [10] High Performance Stencil Code Generation with LIFT
    Hagedorn, Bastian
    Stoltzfus, Larisa
    Steuwer, Michel
    Gorlatch, Sergei
    Dubach, Christophe
    [J]. PROCEEDINGS OF THE 2018 INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO'18), 2018, : 100 - 112