Optimal Checkpointing Period: Time vs. Energy

被引:5
作者
Aupy, Guillaume [1 ]
Benoit, Anne [1 ]
Herault, Thomas [2 ]
Robert, Yves [1 ,2 ]
Dongarra, Jack [2 ]
机构
[1] Ecole Normale Super Lyon, Lab LIP, F-69364 Lyon, France
[2] Univ Tennessee, Knoxville, TN 37996 USA
来源
HIGH PERFORMANCE COMPUTING SYSTEMS: PERFORMANCE MODELING, BENCHMARKING AND SIMULATION | 2014年 / 8551卷
关键词
D O I
10.1007/978-3-319-10214-6_10
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This short paper deals with parallel scientific applications using non-blocking and periodic coordinated checkpointing to enforce resilience. We provide a model and detailed formulas for total execution time and consumed energy. We characterize the optimal period for both objectives, and we assess the range of time/energy trade-offs to be made by instantiating the model with a set of realistic scenarios for Exascale systems. We give a particular emphasis to I/O transfers, because the relative cost of communication is expected to dramatically increase, both in terms of latency and consumed energy, for future Exascale platforms.
引用
收藏
页码:203 / 214
页数:12
相关论文
共 15 条
[1]  
Bosilca G., 2013, CONCURRENCY IN PRESS
[2]   PREVENTIVE MIGRATION VS. PREVENTIVE CHECKPOINTING FOR EXTREME SCALE SUPERCOMPUTERS [J].
Cappello, Franck ;
Casanova, Henri ;
Robert, Yves .
PARALLEL PROCESSING LETTERS, 2011, 21 (02) :111-132
[3]   DISTRIBUTED SNAPSHOTS - DETERMINING GLOBAL STATES OF DISTRIBUTED SYSTEMS [J].
CHANDY, KM ;
LAMPORT, L .
ACM TRANSACTIONS ON COMPUTER SYSTEMS, 1985, 3 (01) :63-75
[4]  
Daly J.T., 2004, FGCS, V22, P303
[5]  
Dongarra J., 2013, 15 WORKSH ADV PAR DI
[6]   THE INTERNATIONAL EXASCALE SOFTWARE PROJECT: A CALL TO COOPERATIVE ACTION BY THE GLOBAL HIGH-PERFORMANCE COMMUNITY [J].
Dongarra, Jack ;
Beckman, Pete ;
Aerts, Patrick ;
Cappello, Frank ;
Lippert, Thomas ;
Matsuoka, Satoshi ;
Messina, Paul ;
Moore, Terry ;
Stevens, Rick ;
Trefethen, Anne ;
Valero, Mateo .
INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2009, 23 (04) :309-322
[7]  
Ferreira K., 2011, P ACM IEEE SC C
[8]  
Meneses E., 2012, P 2012 IEEE 24 INT S
[9]  
Ni X., 2012, P 2012 INT C CLUST C
[10]  
Rajachandrasekar Raghunath., 2013, P 22 INT S HIGH PERF, P143