Accelerating Large-Scale CFD Simulations with Lattice Boltzmann Method on a 40-Million-Core Sunway Supercomputer

被引:4
作者
Liu, Zhao [1 ]
Chu, Xuesen [1 ,2 ,3 ]
Lv, Xiaojing [2 ]
Liu, Hanyue [4 ]
Fu, Haohuan [4 ,5 ]
Yang, Guangwen [1 ,4 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing, Peoples R China
[2] China Ship Sci Res Ctr, Wuxi, Jiangsu, Peoples R China
[3] Taihu Lab DeepSea Technol Sci, Wuxi, Jiangsu, Peoples R China
[4] Natl Supercomp Ctr Wuxi, Wuxi, Jiangsu, Peoples R China
[5] Tsinghua Univ, Dept Earth Syst Sci, Beijing, Peoples R China
来源
PROCEEDINGS OF THE 52ND INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2023 | 2023年
关键词
Lattice Boltzmann Method; Sunway Supercomputer; heterogeneous systems; parallel scalability; MODELS;
D O I
10.1145/3605573.3605605
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The Lattice Boltzmann Method (LBM) has gained widespread popularity due to its applicability in fluid dynamics, chemical engineering, material science, and other domains. In this work, we present an optimized implementation of the LBM, with a specific focus on achieving superior performance and scalability on advanced heterogeneous systems such as the new Sunway supercomputer. To accomplish this, we employ several techniques, including kernel fusion to enhance temporal and spatial locality, a customized multi-level domain decomposition and data sharing scheme, and pipelining strategies that are tailored to the SW26010-Pro processor. As a result of these optimizations, we have successfully scaled our code to a total of 39,000,000 CPU cores. Our largest simulation, which encompassed over 42 trillion lattice cells, achieved an impressive 67,018 billion lattice cell updates per second (GLUPS), with 82.9% memory bandwidth utilization, and a sustained performance of 28 PFlops. In order to assess the portability of our implementation, we also adapted our code to run on a GPU cluster, utilizing a range of tailored optimization techniques. Our results demonstrated a 191x speedup, along with 83.8% memory bandwidth utilization. Our proposed approach marks a significant milestone in the field of LBM implementations, as it demonstrates unprecedented scalability by effectively utilizing over 39,000,000 cores while maintaining exceptional parallel efficiency and computational performance. This achievement establishes our method as a compelling solution for addressing large-scale computational fluid dynamics challenges on heterogeneous systems.
引用
收藏
页码:797 / 806
页数:10
相关论文
共 35 条
[1]  
Bailey Peter, 2009, Proceedings of the 2009 International Conference on Parallel Processing (ICPP 2009), P550, DOI 10.1109/ICPP.2009.38
[2]  
Chase N., 2012, Simulations of the DARPA SUBOFF Submarine Including SelfPropulsion with the E1619 Propeller
[3]   Early experience on porting and running a Lattice Boltzmann code on the Xeon-Phi co-processor [J].
Crimi, G. ;
Mantovani, F. ;
Pivanti, M. ;
Schifano, S. F. ;
Tripiccione, R. .
2013 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, 2013, 18 :551-560
[4]   Equivalent partial differential equations of a lattice Boltzmann scheme [J].
Dubois, Francois .
COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2008, 55 (07) :1441-1449
[5]   swDNN: A Library for Accelerating Deep Learning Applications on Sunway TaihuLight [J].
Fang, Jiarui ;
Fu, Haohuan ;
Zhao, Wenlai ;
Chen, Bingwei ;
Zheng, Weijie ;
Yang, Guangwen .
2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2017, :615-624
[6]   Performance modeling and analysis of heterogeneous lattice Boltzmann simulations on CPU-GPU clusters [J].
Feichtinger, Christian ;
Habich, Johannes ;
Koestler, Harald ;
Ruede, Ulrich ;
Aoki, Takayuki .
PARALLEL COMPUTING, 2015, 46 :1-13
[7]  
Fietz J, 2012, LECT NOTES COMPUT SC, V7484, P818, DOI 10.1007/978-3-642-32820-6_81
[8]   The Sunway TaihuLight supercomputer: system and applications [J].
Fu, Haohuan ;
Liao, Junfeng ;
Yang, Jinzhe ;
Wang, Lanning ;
Song, Zhenya ;
Huang, Xiaomeng ;
Yang, Chao ;
Xue, Wei ;
Liu, Fangfang ;
Qiao, Fangli ;
Zhao, Wei ;
Yin, Xunqiang ;
Hou, Chaofeng ;
Zhang, Chenglong ;
Ge, Wei ;
Zhang, Jian ;
Wang, Yangang ;
Zhou, Chunbo ;
Yang, Guangwen .
SCIENCE CHINA-INFORMATION SCIENCES, 2016, 59 (07)
[9]   Assessment of micro-wind turbines performance in the urban environments: An aided methodology through geographical information systems [J].
Gagliano A. ;
Nocera F. ;
Patania F. ;
Capizzi A. .
Int. J. Energy Environ. Eng., 1 (1-14) :1-14
[10]   A Framework for Hybrid Parallel Flow Simulations with a Trillion Cells in Complex Geometries [J].
Godenschwager, Christian ;
Schornbaum, Florian ;
Bauer, Martin ;
Koestler, Harald ;
Ruede, Ulrich .
2013 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2013,