Moka: Model-based Concurrent Kernel Analysis

被引:0
作者
Yu, Leiming [1 ]
Gong, Xun [1 ]
Sun, Yifan [1 ]
Fang, Qianqian [1 ]
Rubin, Norm [2 ]
Kaeli, David [1 ]
机构
[1] Northeastern Univ, Boston, MA 02115 USA
[2] NVIDIA Res, Santa Clara, CA USA
来源
PROCEEDINGS OF THE 2017 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC) | 2017年
关键词
GPU; Concurrent Kernel Execution; Empirical Model; GRAPHICS;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Today's GPUs continue to increase the number of compute resources with each new generation. Many data-parallel applications have been re-engineered to leverage the thousands of cores on the GPU. But not every kernel can fully utilize all the resources available. Many applications contain multiple kernels that could potentially be run concurrently. To better utilize the massive resources on the GPU, device vendors have started to support Concurrent Kernel Execution (CKE). However, the application throughput provided by CKE is subject to a number of factors, including the kernel configuration attributes, the dynamic behavior of each kernel (e.g., compute-intentive vs. memory-intensive), the kernel launch order and inter-kernel dependencies. Minor changes in any of theses factors can have a large impact on the effectiveness of CKE. In this paper, we present Moka, an empirical model for tuning concurrent kernel performance. Moka allows us to accurately predict the resulting performance and scalability of multi-kernel applications when using CKE. We consider both static and dynamic workload characteristics that impact the utility of CKE, and leverage these metrics to drive kernel scheduling decisions on NVIDIA GPUs. The underlying data transfer pattern and GPU resource contention are analyzed in detail. Our model is able to accurately predict the performance ceiling of concurrent kernel execution. We validate our model using several real-world applications that have multiple kernels that can run concurrently, and evaluate CKE performance on a NVIDIA Maxwell GPU. Our model is able to predict the performance of CKE applications accurately, providing estimates that differ by less than 12% as compared to actual runtime performance. Using our estimates, we can quickly find the best CKE strategy for our applications to achieve improved application throughput. We believe we have developed a useful tool to aid application programmers to accelerate their applications using CKE.
引用
收藏
页码:197 / 206
页数:10
相关论文
共 50 条
  • [21] Review and prospect of model-based fault diagnosis technology for liquid rocket engines
    Chen Z.
    Chen H.
    Gao Y.
    Zhang H.
    [J]. Hangkong Xuebao/Acta Aeronautica et Astronautica Sinica, 2023, 44 (23):
  • [22] GPU-Based Algorithms for Processing the k Nearest-Neighbor Query on Spatial Data Using Partitioning and Concurrent Kernel Execution
    Polychronis Velentzas
    Michael Vassilakopoulos
    Antonio Corral
    Christos Antonopoulos
    [J]. International Journal of Parallel Programming, 2023, 51 : 275 - 308
  • [23] GPU-Based Algorithms for Processing the k Nearest-Neighbor Query on Spatial Data Using Partitioning and Concurrent Kernel Execution
    Velentzas, Polychronis
    Vassilakopoulos, Michael
    Corral, Antonio
    Antonopoulos, Christos
    [J]. INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2023, 51 (06) : 275 - 308
  • [24] Online monitoring and diagnosis of batch processes: empirical model-based framework and a case study
    Cho, HW
    Kim, KJ
    Jeong, MK
    [J]. INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 2006, 44 (12) : 2361 - 2378
  • [25] A Model-Based Software Generation Approach Qualified for Heterogeneous GPGPU-Enabled Platforms
    Endt, Holger
    Stolz, Lothar
    Wechs, Martin
    Stechele, Walter
    [J]. APPLICATIONS, TOOLS AND TECHNIQUES ON THE ROAD TO EXASCALE COMPUTING, 2012, 22 : 217 - 223
  • [26] UNSTEADY MODEL-BASED PREDICTIVE CONTROL OF CONTINUOUS STEEL CASTING BY MEANS OF A VERY FAST DYNAMIC SOLIDIFICATION MODEL ON A GPU
    Klimes, Lubomir
    Stetina, Josef
    [J]. MATERIALI IN TEHNOLOGIJE, 2014, 48 (04): : 525 - 530
  • [27] TurboMGNN: Improving Concurrent GNN Training Tasks on GPU With Fine-Grained Kernel Fusion
    Wu, Wenchao
    Shi, Xuanhua
    He, Ligang
    Jin, Hai
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (06) : 1968 - 1981
  • [28] Assessing and Improving the Suitability of Model-Based Design for GPU-Accelerated Railway Control Systems
    Calderon, Alejandro J.
    Kosmidis, Leonidas
    Nicolas, Carlos F.
    de Lasala, Javier
    Larranaga, Ion
    [J]. ARCHITECTURE OF COMPUTING SYSTEMS (ARCS 2021), 2021, 12800 : 68 - 83
  • [29] A static analytical performance model for GPU kernel
    Li, Jingjin
    Chen, Qingkui
    Liu, Bocheng
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2019, 18 (02) : 201 - 210
  • [30] Experimental analysis of biodiesel synthesis from palm kernel oil: empirical model and surface response variables
    Manuel Alejandro Mayorga Betancourt
    Camilo Andres López Santamaria
    Mauricio López Gómez
    Alberth Renne Gonzalez Caranton
    [J]. Reaction Kinetics, Mechanisms and Catalysis, 2020, 131 : 297 - 317