Understanding Stencil Code Performance On MultiCore Architectures

被引:23
作者
Rahman, Shah M. Faizur [1 ]
Yi, Qing [1 ]
Qasem, Apan [2 ]
机构
[1] Univ Texas San Antonio, Dept Comp Sci, San Antonio, TX 78249 USA
[2] Texas State Univ, Dept Comp Sci, San Marcos, TX USA
来源
PROCEEDINGS OF THE 2011 8TH ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS (CF 2011) | 2011年
基金
美国国家科学基金会;
关键词
D O I
10.1145/2016604.2016641
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Stencil computations are the foundation of many large applications in scientific computing. Previous research has shown that several optimization mechanisms, including rectangular blocking and time skewing combined with wavefront- and pipeline-based parallelization, can be used to significantly improve the performance of stencil kernels on multi-core architectures. However, the overall performance impact of these optimizations are difficult to predict due to the inter-play of load imbalance, synchronization overhead, and cache locality. This paper presents a detailed performance study of these optimizations by applying them with a wide variety of different configurations, using hardware counters to monitor the efficiency of architectural components, and then developing a set of formulas via regression analysis to model their overall performance impact in terms of the affected hardware counter numbers. We have applied our methodology to three stencil computation kernels, a 7-point jacobi, a 27-point jacobi, and a 7-point Gauss-Seidel computation. Our experimental results show that a precise formula can be developed for each kernel to accurately model the overall performance impact of varying optimizations and thereby effectively guide the performance analysis and tuning of these kernels.
引用
收藏
页数:10
相关论文
共 29 条
[1]  
Adhianto L., 2009, CONCURRENCY IN PRESS
[2]  
Allen Randy, 2002, OPTIMIZING COMPILERS
[3]  
[Anonymous], 2006, P 2006 WORKSHOP MEMO, DOI DOI 10.1145/1178597
[4]  
[Anonymous], 2000, Supercomputing, ACM/IEEE 2000 Conference
[5]  
[Anonymous], 2008, P 2008 ACM IEEE C SU
[6]  
Bondhugula U., 2008, PLDI 08, P101
[7]  
Cavazos J, 2007, INT SYM CODE GENER, P185
[8]  
Chen C, 2005, INT SYM CODE GENER, P111
[9]  
Christen M, 2009, INT PARALL DISTRIB P, P547
[10]   Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors [J].
Datta, Kaushik ;
Kamil, Shoaib ;
Williams, Samuel ;
Oliker, Leonid ;
Shalf, John ;
Yelick, Katherine .
SIAM REVIEW, 2009, 51 (01) :129-159