StencilMART: Predicting Optimization Selection for Stencil Computations across GPUs

被引：4

作者：

Sun, Qingxiao ^{[1
,2
]}

Liu, Yi ^{[2
]}

Yang, Hailong ^{[1
,2
]}

Jiang, Zhonghui ^{[2
]}

Luan, Zhongzhi ^{[2
]}

Qian, Depei ^{[2
]}

机构：

[1] State Key Lab Software Dev Environm, Beijing 100191, Peoples R China

[2] Beihang Univ, Sch Comp Sci & Engn, Beijing 100191, Peoples R China

来源：

2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2022) | 2022年

基金：

中国国家自然科学基金;

关键词：

Stencil Computation; GPU; Optimization Strategies; Performance Prediction; Machine Learning;

D O I：

10.1109/IPDPS53621.2022.00090

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Stencil computations are widely used in high performance computing (HPC) applications. Many HPC platforms utilize the high computation capability of GPUs to accelerate stencil computations. In recent years, stencils have become more diverse in terms of stencil order, memory accesses and computation patterns. To adapt diverse stencils to GPUs, a variety of optimization techniques have been proposed such as streaming and retiming. However, due to the diversity of stencil patterns and GPU architectures, no single optimization technique fits all stencils. Besides, it is challenging to choose the most cost-efficient GPU for accelerating target stencils. To address the above problems, we propose StencilMART, an automatic optimization selection framework that predicts the best optimization combination and execution time under a certain parameter setting for stencils on GPUs. Specifically, the StencilMART represents the stencil patterns as binary tensors and neighboring features through tensor assignment and feature extraction. In addition, the StencilMART implements various machine learning methods such as classification and regression that utilize stencil representation and hardware characteristics for execution time prediction. The experiment results show that the StencilMART can achieve accurate optimization selection and performance prediction for various stencils across GPUs.

引用

页码：875 / 885

页数：11

共 29 条

[1] Abadi M., 2016, TENSORFLOW LARGE SCA, P265, DOI 10.5555/3026877.3026899
[2] PPT-GPU: Scalable GPU Performance Modeling
Arafa, Yehia
Badawy, Abdel-Hameed A.
Chennupati, Gopinath
Santhi, Nandakishore
Eidenbenz, Stephan
[J]. IEEE COMPUTER ARCHITECTURE LETTERS, 2019, 18 (01) : 55 - 58
[3] Cross-Architecture Performance Prediction (XAPP) Using CPU Code to Predict GPU Performance
Ardalani, Newsha
Lestourgeon, Clint
Sankaralingam, Karthikeyan
Zhu, Xiaojin
[J]. PROCEEDINGS OF THE 48TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO-48), 2015, : 725 - 737
[4] Benesty J., 2009, Noise reduction in speech processing, V2, P1
[5] XGBoost: A Scalable Tree Boosting System
Chen, Tianqi
Guestrin, Carlos
[J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 785 - 794
[6] Autotuning Stencil Computations with Structural Ordinal Regression Learning
Cosenza, Biagio
Durillo, Juan J.
Ermon, Stefano
Juurlink, Ben
[J]. 2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2017, : 287 - 296
[7] Local Recovery and Failure Masking for Stencil-based Applications at Extreme Scales
Gamell, Marc
Teranishi, Keita
Heroux, Michael A.
Mayo, Jackson
Kolla, Hemanth
Chen, Jacqueline
Parashar, Manish
[J]. PROCEEDINGS OF SC15: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2015,
[8] Automatic Performance Tuning of Stencil Computations on GPUs
Garvey, Joseph D.
Abdelrahman, Tarek S.
[J]. 2015 44TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2015, : 300 - 309
[9] Grosser T., 2014, CGO 14, P66
[10] High Performance Stencil Code Generation with LIFT
Hagedorn, Bastian
Stoltzfus, Larisa
Steuwer, Michel
Gorlatch, Sergei
Dubach, Christophe
[J]. PROCEEDINGS OF THE 2018 INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO'18), 2018, : 100 - 112

← 1 2 3 →