Moka: Model-based Concurrent Kernel Analysis

被引：0

作者：

Yu, Leiming ^{[1
]}

Gong, Xun ^{[1
]}

Sun, Yifan ^{[1
]}

Fang, Qianqian ^{[1
]}

Rubin, Norm ^{[2
]}

Kaeli, David ^{[1
]}

机构：

[1] Northeastern Univ, Boston, MA 02115 USA

[2] NVIDIA Res, Santa Clara, CA USA

来源：

PROCEEDINGS OF THE 2017 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC) | 2017年

关键词：

GPU; Concurrent Kernel Execution; Empirical Model; GRAPHICS;

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Today's GPUs continue to increase the number of compute resources with each new generation. Many data-parallel applications have been re-engineered to leverage the thousands of cores on the GPU. But not every kernel can fully utilize all the resources available. Many applications contain multiple kernels that could potentially be run concurrently. To better utilize the massive resources on the GPU, device vendors have started to support Concurrent Kernel Execution (CKE). However, the application throughput provided by CKE is subject to a number of factors, including the kernel configuration attributes, the dynamic behavior of each kernel (e.g., compute-intentive vs. memory-intensive), the kernel launch order and inter-kernel dependencies. Minor changes in any of theses factors can have a large impact on the effectiveness of CKE. In this paper, we present Moka, an empirical model for tuning concurrent kernel performance. Moka allows us to accurately predict the resulting performance and scalability of multi-kernel applications when using CKE. We consider both static and dynamic workload characteristics that impact the utility of CKE, and leverage these metrics to drive kernel scheduling decisions on NVIDIA GPUs. The underlying data transfer pattern and GPU resource contention are analyzed in detail. Our model is able to accurately predict the performance ceiling of concurrent kernel execution. We validate our model using several real-world applications that have multiple kernels that can run concurrently, and evaluate CKE performance on a NVIDIA Maxwell GPU. Our model is able to predict the performance of CKE applications accurately, providing estimates that differ by less than 12% as compared to actual runtime performance. Using our estimates, we can quickly find the best CKE strategy for our applications to achieve improved application throughput. We believe we have developed a useful tool to aid application programmers to accelerate their applications using CKE.

引用

页码：197 / 206

页数：10

共 50 条

[31] Experimental analysis of biodiesel synthesis from palm kernel oil: empirical model and surface response variables [J].

Mayorga Betancourt, Manuel Alejandro ;

Lopez Santamaria, Camilo Andres ;

Lopez Gomez, Mauricio ;

Gonzalez Caranton, Alberth Renne .

REACTION KINETICS MECHANISMS AND CATALYSIS, 2020, 131 (01) :297-317

[32] A GPU method for the analysis stage of the SPTRSV kernel [J].

Freire, Manuel ;

Ferrand, Juan ;

Seveso, Franco ;

Dufrechou, Ernesto ;

Ezzatti, Pablo .

JOURNAL OF SUPERCOMPUTING, 2023, 79 (13) :15051-15078

[33] A GPU method for the analysis stage of the SPTRSV kernel [J].

Manuel Freire ;

Juan Ferrand ;

Franco Seveso ;

Ernesto Dufrechou ;

Pablo Ezzatti .

The Journal of Supercomputing, 2023, 79 :15051-15078

[34] Model-Based MPC Enables Curvilinear ILT using Either VSB or Multi-Beam Mask Writers [J].

Pang, Linyong ;

Takatsukasa, Yutetsu ;

Hara, Daisuke ;

Pomerantsev, Michael ;

Su, Bo ;

Fujimura, Aki .

PHOTOMASK JAPAN 2017: XXIV SYMPOSIUM ON PHOTOMASK AND NEXT-GENERATION LITHOGRAPHY MASK TECHNOLOGY, 2017, 10454

[35] tomoCAM: fast model-based iterative reconstruction via GPU acceleration and non-uniform fast Fourier transforms [J].

Kumar, Dinesh ;

Parkinson, Dilworth Y. ;

Donatelli, Jeffrey J. .

JOURNAL OF SYNCHROTRON RADIATION, 2024, 31 (Pt 1) :85-94

[36] Model-Based Complete Enzymatic Production of 3,6-Anhydro-L-galactose from Red Algal Biomass [J].

Pathiraja, Duleepa ;

Lee, Saeyoung ;

Choi, In-Geol .

JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY, 2018, 66 (26) :6814-6821

[37] Model-Based Design and Optimization of the Microscale Mass Transfer Structure in the Anode Catalyst Layer for Direct Methanol Fuel Cell [J].

Cai, Weiwei ;

Yan, Liang ;

Liang, Liang ;

Xing, Wei ;

Liu, Changpeng .

AICHE JOURNAL, 2013, 59 (03) :780-786

[38] Efficient scatter-based kernel superposition on GPU [J].

da Silva, Joakim ;

Ansorge, Richard ;

Jena, Rajesh .

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2015, 84 :15-23

[39] Automated Kernel Fusion for GPU Based on Code Motion [J].

Fukuhara, Junji ;

Takimoto, Munehiro .

PROCEEDINGS OF THE 23RD ACM SIGPLAN/SIGBED INTERNATIONAL CONFERENCE ON LANGUAGES, COMPILERS, AND TOOLS FOR EMBEDDED SYSTEMS, LCTES 2022, 2022, :151-161

[40] Substitution of kernel functions based on pattern matching on schedule trees [J].

Chen, Zi-Xuan ;

Yang, Wuu .

53RD INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2024, 2024, :48-57

← 1 2 3 4 5 →