Improving memory hierarchy performance for irregular applications using data and computation reorderings

被引:65
作者
Mellor-Crummey, J
Whalley, D
Kennedy, K
机构
[1] Rice Univ, Dept Comp Sci, Houston, TX 77005 USA
[2] Florida State Univ, Dept Comp Sci, Tallahassee, FL 32306 USA
关键词
memory hierarchy optimization; data reordering; computation reordering; space-filling curves; multi-level blocking;
D O I
10.1023/A:1011119519789
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The performance of irregular applications on modern computer systems is hurt by the wide gap between CPU and memory speeds because these applications typically under-utilize multi-level memory hierarchies, which help hide this gap. This paper investigates using data and computation reorderings to improve memory hierarchy utilization for irregular applications. We evaluate the impact of reordering on data reuse at different levels in the memory hierarchy. We focus on coordinated data and computation reordering based on space-filling curves and we introduce a new architecture-independent multi-level blocking strategy for irregular applications. For two particle codes we studied, the most effective reorderings reduced overall execution time by a factor of two and four, respectively. Preliminary experience with a scatter benchmark derived from a large unstructured mesh application showed that careful data and computation ordering reduced primary cache misses by a factor of two compared to a random ordering.
引用
收藏
页码:217 / 247
页数:31
相关论文
共 37 条
[1]  
ABUSUFAH W, 1979, P 1979 NAT COMP C, P969
[2]  
ALFURAIH I, 1998, P INT PAR P S MARCH
[3]  
ALLEN JR, 1984, SIGPLAN NOTICES, V19, P233, DOI 10.1145/502949.502897
[4]   CHARMM - A PROGRAM FOR MACROMOLECULAR ENERGY, MINIMIZATION, AND DYNAMICS CALCULATIONS [J].
BROOKS, BR ;
BRUCCOLERI, RE ;
OLAFSON, BD ;
STATES, DJ ;
SWAMINATHAN, S ;
KARPLUS, M .
JOURNAL OF COMPUTATIONAL CHEMISTRY, 1983, 4 (02) :187-217
[5]  
CALLAHAN D, 1990, P ACM SIGPLAN 90 C P, P53
[6]  
CUTHILL E, 1969, P ACM NAT C
[7]   DESIGN AND IMPLEMENTATION OF A PARALLEL UNSTRUCTURED EULER SOLVER USING SOFTWARE PRIMITIVES [J].
DAS, R ;
MAVRIPLIS, DJ ;
SALTZ, J ;
GUPTA, S ;
PONNUSAMY, R .
AIAA JOURNAL, 1994, 32 (03) :489-496
[8]   Improving cache performance in dynamic applications through data and computation reorganization at run time [J].
Ding, C ;
Kennedy, K .
ACM SIGPLAN NOTICES, 1999, 34 (05) :229-241
[9]  
FERRANTE J, 1991, P 4 WORKSH LANG COMP
[10]   Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code [J].
Frens, JD ;
Wise, DS .
ACM SIGPLAN NOTICES, 1997, 32 (07) :206-216