Calculation of the high-energy neutron flux for anticipating errors and recovery techniques in exascale supercomputer centres

被引:3
作者
Asorey, Hernan [1 ,2 ]
Mayo-Garcia, Rafael [3 ]
机构
[1] Comis Nacl Energia Atom, Ctr Atom Bariloche, Med Phys Dept, Av E Bustillo 9500, RA-8400 San Carlos De Bariloche, Rio Negro, Argentina
[2] Comis Nacl Energia Atom, Ctr Atom Constituyentes, Inst Tecnol Detecc & Astroparticulas ITeDA, Av Gral Paz 1499, RA-1650 Buenos Aires, DF, Argentina
[3] Ctr Invest Energet Medioambientales & Tecnol CIEM, Technol Dept, Av Complutense 40, Madrid 28040, Spain
关键词
Neutron flux; Supercomputing; HPC; Exascale; Atmospheric radiation; SOFT ERRORS; SIMULATION; SPECTRUM; DETECTOR; SYSTEMS; MODEL; MPI;
D O I
10.1007/s11227-022-04981-8
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The age of exascale computing has arrived, and the risks associated with neutron and other atmospheric radiation are becoming more critical as the computing power increases; hence, the expected mean time between failures will be reduced because of this radiation. In this work, a new and detailed calculation of the neutron flux for energies above 50 MeV is presented. This has been done by using state-of-the-art Monte Carlo astroparticle techniques and including real atmospheric profiles at each one of the next 23 exascale supercomputing facilities. Atmospheric impact in the flux and seasonal variations were observed and characterized, and the barometric coefficient for high-energy neutrons at each site was obtained. With these coefficients, potential risks of errors associated with the increase in the flux of energetic neutrons, such as the occurrence of single event upsets or transients, and the corresponding failure-in-time rates, can be anticipated just by using the atmospheric pressure before the assignation of resources to critical tasks at each exascale facility. For more clarity, examples about how the rate of failures is affected by the cosmic rays are included, so administrators will better anticipate which more or less restrictive actions could take for overcoming errors.
引用
收藏
页码:8205 / 8235
页数:31
相关论文
共 111 条
  • [61] Negative and Positive Muon-induced SEU Cross Sections in 28-nm and 65-nm Planar Bulk CMOS SRAMs
    Liao, Wang
    Hashimoto, Masanori
    Manabe, Seiya
    Watanabe, Yukinobu
    Abe, Shin-ichiro
    Nakano, Keita
    Takeshita, Hayato
    Tampo, Motonobu
    Takeshita, Soshi
    Miyake, Yasuhiro
    [J]. 2019 IEEE INTERNATIONAL RELIABILITY PHYSICS SYMPOSIUM (IRPS), 2019,
  • [62] Neutron radiation effects on an electronic system on module
    Lo Presti, Domenico
    Medina, Nilberto H.
    Guazzelli, Marcilei A.
    Moralles, Mauricio
    Aguiar, Vitor A. P.
    Oliveira, Jose R. B.
    Added, Nemitala
    Macchione, Eduardo L. A.
    Siqueira, Paulo de Tarso D.
    Zahn, Guilherme
    Genezini, Frederico
    Bonanno, Danilo
    Gallo, Giuseppe
    Russo, Salvatore
    Sgouros, Onoufrios
    Muoio, Annamaria
    Pandola, Luciano
    Cappuzzello, Francesco
    [J]. REVIEW OF SCIENTIFIC INSTRUMENTS, 2020, 91 (08)
  • [63] Fault tolerance of MPI applications in exascale systems: The ULFM solution
    Losada, Nuria
    Gonzalez, Patricia
    Martin, Maria J.
    Bosilca, George
    Bouteiller, Aurelien
    Teranishi, Keita
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 106 (106): : 467 - 481
  • [64] Superposed epoch study of ICME sub-structures near Earth and their effects on Galactic cosmic rays
    Masias-Meza, J. J.
    Dasso, S.
    Demoulin, P.
    Rodriguez, L.
    Janvier, M.
    [J]. ASTRONOMY & ASTROPHYSICS, 2016, 592
  • [65] A Heitler model of extensive air showers
    Matthews, J
    [J]. ASTROPARTICLE PHYSICS, 2005, 22 (5-6) : 387 - 397
  • [66] Using the LANSCE irradiation facility to predict the number of fatal soft errors in one of the world's fastest supercomputers
    Michalak, SE
    Harris, KW
    Hengartner, NW
    Takala, BE
    Wender, SA
    [J]. NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH SECTION B-BEAM INTERACTIONS WITH MATERIALS AND ATOMS, 2005, 241 (1-4) : 414 - 418
  • [67] Current status and possible extension of the global neutron monitor network
    Mishev, Alexander
    Usoskin, Ilya
    [J]. JOURNAL OF SPACE WEATHER AND SPACE CLIMATE, 2020, 10
  • [68] Error resilience of three GMRES implementations under fault injection
    Morinigo, Jose A.
    Bustos, Andres
    Mayo-Garcia, Rafael
    [J]. JOURNAL OF SUPERCOMPUTING, 2022, 78 (05) : 7158 - 7185
  • [69] On the modelling of optimal coordinated checkpoint period in supercomputers
    Morinigo, Jose A.
    Rodriguez-Pascual, Manuel
    Mayo-Garcia, Rafael
    [J]. JOURNAL OF SUPERCOMPUTING, 2019, 75 (02) : 930 - 954
  • [70] National Aerospace Administration (NASA), 1976, NOAA TECHNICAL REPOR