Communication-overlap techniques for improved strong scaling of gyrokinetic Eulerian code beyond 100k cores on the K-computer

被引:17
作者
Idomura, Yasuhiro [1 ,2 ]
Nakata, Motoki [2 ]
Yamada, Susumu [1 ]
Machida, Masahiko [1 ]
Imamura, Toshiyuki [3 ]
Watanabe, Tomohiko
Nunami, Masanori
Inoue, Hikaru
Tsutsumi, Shigenobu
Miyoshi, Ikuo
Shida, Naoyuki
机构
[1] Japan Atom Energy Agcy, Ctr Computat Sci & E Syst, Kashiwa, Chiba 2778587, Japan
[2] Japan Atom Energy Agcy, Fus Res & Dev Directorate, Kashiwa, Chiba 2778587, Japan
[3] RIKEN, Adv Inst Computat Sci, Wako, Saitama, Japan
关键词
Fusion plasma turbulence; gyrokinetic simulation; Eulerian approach; communication overlap; K-computer;
D O I
10.1177/1094342013490973
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Plasma turbulence research based on five-dimensional (5D) gyrokinetic simulations is one of the most critical and demanding issues in fusion science. To pioneer new physics regimes both in problem sizes and in timescales, an improvement of strong scaling is essential. Overlap of computations and communications using non-blocking MPI communication schemes is a promising approach to improving strong scaling, but it often fails on practical applications with conventional MPI libraries. In this work, this classical issue is resolved by developing communication-overlap techniques with additional MPI support for non-blocking communication routines and with heterogeneous OpenMP threads, which work even on conventional MPI libraries and network hardware. These techniques dramatically improved the parallel efficiency of a gyrokinetic toroidal 5D Eulerian code GT5D on the K-computer, which has a dedicated network, and on the Helios system which has a commodity network. On the K-computer, excellent strong scaling was achieved beyond 100k cores whilst keeping a sustained performance of similar to 10% (similar to 307 TFlops using 196,608 cores), and simulations for next-generation large-scale fusion experiments are significantly accelerated. This performance is 16x sped up compared with the maximum performance reported at the 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (similar to 19 TFlops using 16,384 cores of the BX900 cluster) (Idomura, 2011).
引用
收藏
页码:73 / 86
页数:14
相关论文
共 27 条
  • [1] Foundations of nonlinear gyrokinetic theory
    Brizard, A. J.
    Hahm, T. S.
    [J]. REVIEWS OF MODERN PHYSICS, 2007, 79 (02) : 421 - 468
  • [2] Consequences of profile shearing on toroidal momentum transport
    Camenen, Y.
    Idomura, Y.
    Jolliet, S.
    Peeters, A. G.
    [J]. NUCLEAR FUSION, 2011, 51 (07)
  • [3] The local limit of global gyrokinetic simulations
    Candy, J
    Waltz, RE
    Dorland, W
    [J]. PHYSICS OF PLASMAS, 2004, 11 (05) : L25 - L28
  • [4] Comparisons and physics basis of tokamak transport models and turbulence simulations
    Dimits, AM
    Bateman, G
    Beer, MA
    Cohen, BI
    Dorland, W
    Hammett, GW
    Kim, C
    Kinsey, JE
    Kotschenreuther, M
    Kritz, AH
    Lao, LL
    Mandrekas, J
    Nevins, WM
    Parker, SE
    Redd, AJ
    Shumaker, DE
    Sydora, R
    Weiland, J
    [J]. PHYSICS OF PLASMAS, 2000, 7 (03) : 969 - 983
  • [5] VARIATIONAL ITERATIVE METHODS FOR NONSYMMETRIC SYSTEMS OF LINEAR-EQUATIONS
    EISENSTAT, SC
    ELMAN, HC
    SCHULTZ, MH
    [J]. SIAM JOURNAL ON NUMERICAL ANALYSIS, 1983, 20 (02) : 345 - 357
  • [6] Gyrokinetic simulations of turbulent transport
    Garbet, X.
    Idomura, Y.
    Villard, L.
    Watanabe, T. H.
    [J]. NUCLEAR FUSION, 2010, 50 (04)
  • [7] A high-performance, portable implementation of the MPI message passing interface standard
    Gropp, W
    Lusk, E
    Doss, N
    Skjellum, A
    [J]. PARALLEL COMPUTING, 1996, 22 (06) : 789 - 828
  • [8] Study of ion turbulent transport and profile formations using global gyrokinetic full-f Vlasov simulation
    Idomura, Y.
    Urano, H.
    Aiba, N.
    Tokuda, S.
    [J]. NUCLEAR FUSION, 2009, 49 (06)
  • [9] Idomura Y, 2011, STATE PRACTICE REPOR
  • [10] Idomura Y, 2010, 23 INT AT EN AG FUS