共 50 条
CFD Builder: A Library Builder for Computational Fluid Dynamics
被引:2
|作者:
Jayaraj, Jagan
[1
]
Lin, Pei-Hung
[2
]
Woodward, Paul R.
[3
]
Yew, Pen-Chung
[3
]
机构:
[1] Sandia Natl Labs, POB 5800, Albuquerque, NM 87185 USA
[2] Lawrence Livermore Natl Lab, Livermore, CA USA
[3] Univ Minnesota, Minneapolis, MN USA
来源:
关键词:
source-to-source;
high performance;
CFD;
hierarchical data layout;
briquette;
D O I:
10.1109/IPDPSW.2014.117
中图分类号:
TP301 [理论、方法];
学科分类号:
081202 ;
摘要:
Computational Fluid Dynamics is an important area in scientific computing. The weak scaling of codes is well understood with about two decades of experiences using MPI. As a result, per-node performance has become very crucial to the overall machine performance. However, despite the use of multi-threading, obtaining good performance at each core is still extremely challenging. The challenges are primarily due to memory bandwidth limitations and difficulties in using short SIMD engines effectively. This work is about the techniques and a tool to improve in-core performance. Fundamental to the strategy is a hierarchical data layout made of small cubical structures of the problem states that can fit well in the cache hierarchy. The difficulties in computing the spatial derivatives (also called near-neighbor computation in the literature) in a hierarchical data layout are well known, hence, such a data layout has rarely been used in finite difference codes. This work discusses how to program relatively easily for such a hierarchical data layout, the inefficiencies in this programming strategy, and how to overcome the inefficiencies. The key technique to eliminate the overheads is called pipeline-for-reuse. It is followed by a storage optimization called maximal array contraction. Both pipeline-for-reuse and maximal array contraction are highly tedious and error-prone. Therefore, we built a source-to-source translator called CFD Builder to automate the transformations using directives. The directive-based approach leverages domain experts' knowledge about the code, and eliminates the need for complex analysis before program transformations. We demonstrated the effectiveness of this approach using three different applications on two different architectures and two different compilers. We see up to 6.92 x performance improvement using such an approach. We believe such an approach could enable library and application writers to build efficient CFD libraries.
引用
收藏
页码:1030 / 1039
页数:10
相关论文