A Java']Javaspace-based Framework for Efficient Fault-Tolerant Master-Worker Distributed Applications

被引:0
作者
Galtier, Virginie [1 ]
Makassikis, Constantinos [1 ,2 ]
Vialle, Stephane [1 ,2 ]
机构
[1] SUPELEC, UMI 2958, Gif Sur Yvette, France
[2] AlGorille INRIA Project Team, Gif Sur Yvette, France
来源
PROCEEDINGS OF THE 19TH INTERNATIONAL EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING | 2011年
关键词
master-worker; framework; distributed fault tolerance; checkpointing; user-framework-middleware cooperation;
D O I
10.1109/PDP.2011.82
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a framework built around a JavaSpace to ease the development of bag-of-tasks applications. The framework may optionally and automatically tolerate transient crash failures occurring on any of the distributed elements. It relies on checkpointing and underlying middleware mechanisms to do so. To further improve checkpointing efficiency, both in size and frequency, the programmer can introduce intermediate user-defined checkpoint data and code within the task processing program. The framework used without fault tolerance accelerates application development, does not introduce runtime overhead and yields to expected speedup. When enabling fault tolerance, our framework allows, despite failures, correct completion of applications with limited runtime and data storage overheads. Experiments run with up to 128 workers study the impact of some user-related and implementation-related on overall performance, and reveal good performances for classical JavaSpace-based master-worker application profiles.
引用
收藏
页码:272 / 276
页数:5
相关论文
共 7 条
  • [1] [Anonymous], 2010, PROACTIVE PROGRAMMIN
  • [2] [Anonymous], 2010, JAVA RMI SPECIFICATI
  • [3] [Anonymous], P 5 IEEE ACM INT WOR
  • [4] Bouteiller A., 2006, INT J HIGH PERFORMAN
  • [5] Gelernter D., 1985, ACM T PROGR LANG SYS, V7, P80
  • [6] Goux J.-P., 2000, HPDC
  • [7] Hargrove P., 2006, P SCIDAC