Efficient parallel implementation of the lattice Boltzmann method on large clusters of graphic processing units

被引:0
作者
XIONG QinGang1
2 Graduate University of Chinese Academy of Sciences
机构
基金
中国国家自然科学基金;
关键词
asynchronous execution; compute unified device architecture; graphic processing unit; lattice Boltzmann method; non-blocking message passing interface; OpenMP;
D O I
暂无
中图分类号
TP391.41 [];
学科分类号
080203 ;
摘要
Many-core processors, such as graphic processing units (GPUs), are promising platforms for intrinsic parallel algorithms such as the lattice Boltzmann method (LBM). Although tremendous speedup has been obtained on a single GPU compared with mainstream CPUs, the performance of the LBM for multiple GPUs has not been studied extensively and systematically. In this article, we carry out LBM simulation on a GPU cluster with many nodes, each having multiple Fermi GPUs. Asynchronous execution with CUDA stream functions, OpenMP and non-blocking MPI communication are incorporated to improve efficiency. The algorithm is tested for two-dimensional Couette flow and the results are in good agreement with the analytical solution. For both the oneand two-dimensional decomposition of space, the algorithm performs well as most of the communication time is hidden. Direct numerical simulation of a two-dimensional gas-solid suspension containing more than one million solid particles and one billion gas lattice cells demonstrates the potential of this algorithm in large-scale engineering applications. The algorithm can be directly extended to the three-dimensional decomposition of space and other modeling methods including explicit grid-based methods.
引用
收藏
页码:707 / 715
页数:9
相关论文
共 22 条
  • [1] 耦合Nvidia/AMD两类GPU的格子玻尔兹曼模拟
    李博
    李曦鹏
    张云
    陈飞国
    徐骥
    王小伟
    何险峰
    王健
    葛蔚
    李静海
    [J]. 科学通报, 2009, 54 (20) : 3177 - 3184
  • [2] Meso-scale oriented simulation to- wards virtual process engineering (VPE)—The EMMS paradigm. Ge W,Wang W,Yang N, et al. Chemical Engineering Science . 2011
  • [3] 单相流动数值模拟的SIMPLE算法在GPU上的实现
    王健
    许明
    葛蔚
    李静海
    [J]. 科学通报, 2010, (20) : 1979 - 1986
  • [4] Molecular dynamics simulation of complex multiphase flow on a computer cluster with GPUs
    Chen FeiGuo
    Ge Wei
    Li JingHai
    [J]. SCIENCE IN CHINA SERIES B-CHEMISTRY, 2009, 52 (03): : 372 - 380
  • [5] Meso-scale oriented simulation towards virtual process engineering (VPE)-The EMMS Paradigm
    Ge, Wei
    Wang, Wei
    Yang, Ning
    Li, Jinghai
    Kwauk, Mooson
    Chen, Feiguo
    Chen, Jianhua
    Fang, Xiaojian
    Guo, Li
    He, Xianfeng
    Liu, Xinhua
    Liu, Yaning
    Lu, Bona
    Wang, Jian
    Wang, Junwu
    Wang, Limin
    Wang, Xiaowei
    Xiong, Qingang
    Xu, Ming
    Deng, Lijuan
    Han, Yongsheng
    Hou, Chaofeng
    Hua, Leina
    Huang, Wenlai
    Li, Bo
    Li, Chengxiang
    Li, Fei
    Ren, Ying
    Xu, Ji
    Zhang, Nan
    Zhang, Yun
    Zhou, Guofeng
    Zhou, Guangzheng
    [J]. CHEMICAL ENGINEERING SCIENCE, 2011, 66 (19) : 4426 - 4458
  • [6] Direct numerical simulation of sub-grid structures in gas–solid flow—GPU implementation of macro-scale pseudo-particle modeling[J] . Qingang Xiong,Bo Li,Feiguo Chen,Jingsen Ma,Wei Ge,Jinghai Li. &nbspChemical Engineering Science . 2010 (19)
  • [7] LBM based flow simulation using GPU computing processor. Kuznik F,Obrecht C,Rusaouen G, et al. Computers and Mathematics With Applications . 2010
  • [8] Performance analysis of single-phase, multiphase, and multicomponent lattice-Boltzmann fluid flow simu- lations on GPU clusters. Myre J,Walsh S,Lilja D, et al. Concurr Comp-Pract E . 2010
  • [9] NVIDIA CUDA compute unified device architecture Pro- gramming Guide Version 3.1. NVIDIA. . 2010
  • [10] Multi-scale Discrete Simulation Paral- lel Computing Based on GPU. Ge W,Chen F,Meng F, et al. . 2009