Optimization of a lattice Boltzmann computation on state-of-the-art multicore platforms

被引:34
作者
Williams, Samuel [1 ,2 ]
Carter, Jonathan [1 ]
Oliker, Jeonid [1 ]
Shalf, John [1 ]
Yelick, Katherine [1 ,2 ]
机构
[1] Lawrence Berkeley Natl Lab Berkeley, CRD NERSC, Berkeley, CA 94720 USA
[2] Univ Calif Berkeley, CS Div, Berkeley, CA 94720 USA
关键词
Lattice Boltzmann; Auto-tuning; Multicore; Cell broadband engine; Niagara;
D O I
10.1016/j.jpdc.2009.04.002
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We present an auto-tuning approach to optimize application performance on emerging multicore architectures. The methodology extends the idea of search-based performance optimizations, popular in linear algebra and FFT libraries, to application-specific computational kernels. Our work applies this strategy to a lattice Boltzmann application (LBMHD) that historically has made poor use of scalar microprocessors due to its complex data structures and memory access patterns. We explore one of the broadest sets of multicore architectures in the high-performance computing (HPC) literature, including the Intel Xeon E5345 (Clovertown), AMD Opteron 2214 (Santa Rosa), AMD Opteron 2356 (Barcelona), Sun T5140 T2+ (Victoria Falls), as well as a QS20 IBM Cell Blade. Rather than hand-tuning LBMHD for each system, we develop a code generator that allows us to identify a highly optimized version for each platform, while amortizing the human programming effort. Results show that our auto-tuned LBMHD application achieves up to a 15 times improvement compared with the original code at a given concurrency. Additionally, we present a detailed analysis of each optimization, which reveals surprising hardware bottlenecks and software challenges for future multicore systems and applications. (c) 2009 Elsevier Inc. All rights reserved.
引用
收藏
页码:762 / 777
页数:16
相关论文
共 27 条
[1]  
Asanovic K., 2006, UCBEECS2006183 U CAL
[2]   A MODEL FOR COLLISION PROCESSES IN GASES .1. SMALL AMPLITUDE PROCESSES IN CHARGED AND NEUTRAL ONE-COMPONENT SYSTEMS [J].
BHATNAGAR, PL ;
GROSS, EP ;
KROOK, M .
PHYSICAL REVIEW, 1954, 94 (03) :511-525
[3]  
Biskamp D., 2003, Magnetohydrodynamic Turbulence
[4]  
CARTER J, 2005, P SC2005 HIGH PERF C
[5]  
DELLAR P, 2002, J COMPUT PHYS, V79
[6]  
Frigo Matteo, 2005, P 19 ANN INT C SUPER, P361, DOI DOI 10.1145/1088149.1088197
[7]  
Gschwind Michael., 2006, P 3 C COMPUTING FRON, P1
[8]   EVALUATING ASSOCIATIVITY IN CPU CACHES [J].
HILL, MD ;
SMITH, AJ .
IEEE TRANSACTIONS ON COMPUTERS, 1989, 38 (12) :1612-1630
[9]  
*IEEE OP GROUP, 2004, OP GROUP BAS SPEC
[10]  
Kamil Shoaib, 2006, P 2006 WORKSHOP MEMO, P51, DOI [10.1145/1178597, DOI 10.1145/1178597]