Automatically Selecting Profitable Thread Block Sizes for Accelerated Kernels

被引:9
作者
Connors, Tiffany A. [1 ]
Qasem, Apan [1 ]
机构
[1] Texas State Univ, Comp Sci Dept, San Marcos, TX 78666 USA
来源
2017 19TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS (HPCC) / 2017 15TH IEEE INTERNATIONAL CONFERENCE ON SMART CITY (SMARTCITY) / 2017 3RD IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (DSS) | 2017年
基金
美国国家科学基金会;
关键词
D O I
10.1109/HPCC-SmartCity-DSS.2017.58
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Graphics processing units (GPUs) provide high performance at low power consumption as long as resources are well utilized. Thread block size is one factor in determining a kernel's occupancy, which is a metric for measuring GPU utilization. A general guideline is to find the block size that leads to the highest occupancy. However, many combinations of block and grid sizes can provide highest occupancy, but performance can vary significantly between different configurations. This is because variation in thread structure yields different utilization of hardware resources. Thus, optimizing for occupancy alone is insufficient and thread structure must also be considered. It is the programmer's responsibility to set block size, but selecting the right size is not always intuitive. In this paper, we propose using machine learning to automatically select profitable block sizes. Additionally, we show that machine learning techniques coupled with performance counters can provide insight into the underlying reasons for performance variance between different configurations.
引用
收藏
页码:442 / 449
页数:8
相关论文
共 22 条
[1]  
[Anonymous], 2007, P INT S COD GEN OPT
[2]  
[Anonymous], P 44 ANN IEEE ACM IN
[3]  
[Anonymous], P INT C HIGH PERF CO
[4]  
[Anonymous], P 6 WORKSH GEN PURP
[5]  
[Anonymous], P C HIGH PERF COMP N
[6]  
Curtis-Maury M., 2008, P 17 INT C PAR ARCH
[7]   Approximate Graph Clustering for Program Characterization [J].
Demme, John ;
Sethumadhavan, Simha .
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2012, 8 (04)
[8]  
Emani M. K., 2015, P 36 ACM SIGPLAN C P
[9]   Milepost GCC: Machine Learning Enabled Self-tuning Compiler [J].
Fursin, Grigori ;
Kashnikov, Yuriy ;
Memon, Abdul Wahid ;
Chamski, Zbigniew ;
Temam, Olivier ;
Namolaru, Mircea ;
Yom-Tov, Elad ;
Mendelson, Bilha ;
Zaks, Ayal ;
Courtois, Eric ;
Bodin, Francois ;
Barnard, Phil ;
Ashton, Elton ;
Bonilla, Edwin ;
Thomson, John ;
Williams, Christopher K. I. ;
O'Boyle, Michael .
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2011, 39 (03) :296-327
[10]  
Guo YC, 2015, IEEE INT CONF COMMUN