Lightweight Consistent Recovery Algorithm for Sender-Based Message Logging in Distributed Systems

被引:1
作者
Ahn, Jinho [1 ]
机构
[1] Kyonggi Univ, Dept Comp Sci, Suwon, Gyeonggi Do, South Korea
关键词
distributed systems; fault-tolerance; message logging; checkpointing; scalability; consistent recovery;
D O I
10.1587/transinf.E94.D.1712
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sender-based message logging (SBML) with checkpointing has its well-known beneficial feature, lowering highly failure-free overhead of synchronous logging with volatile logging at sender's memory. This feature encourages it to be applied into many distributed systems as a low-cost transparent rollback recovery technique. However, the original SBML recovery algorithm may no longer be progressing in some transient communication error cases. This paper proposes a consistent recovery algorithm to solve this problem by piggybacking small log information for unstable messages received on each acknowledgement message for returning the receive sequence number assigned to a message by its receiver. Our algorithm also enables all messages scheduled to be sent, but delayed because of some preceding unstable messages to be actually transmitted out much earlier than the existing ones.
引用
收藏
页码:1712 / 1715
页数:4
相关论文
共 50 条
[31]   A communication-induced checkpointing and asynchronous recovery algorithm for multithreaded distributed systems [J].
Tantikul, T ;
Manivannan, D .
PARALLEL AND DISTRIBUTED COMPUTING: APPLICATIONS AND TECHNOLOGIES, PROCEEDINGS, 2004, 3320 :284-292
[32]   A market-based optimization algorithm for distributed systems [J].
Guo, Zhiling ;
Koehler, Gary J. ;
Whinston, Andrew B. .
MANAGEMENT SCIENCE, 2007, 53 (08) :1345-1358
[33]   Incorporating message weights in UML-based analysis of behavioral dependencies in distributed systems [J].
Garousi, Vahid .
SOFTWARE AND SYSTEMS MODELING, 2010, 9 (01) :113-137
[34]   Incorporating message weights in UML-based analysis of behavioral dependencies in distributed systems [J].
Vahid Garousi .
Software & Systems Modeling, 2010, 9 :113-137
[35]   A Checkpointing and Recovery Algorithm Based on Location Distance, Handoff and Stationary Checkpoints for Mobile Computing Systems [J].
Basu, Sourav ;
Palchaudhuri, Sayantan ;
Podder, Sheuly ;
Chakrabarty, Meghla .
2009 INTERNATIONAL CONFERENCE ON ADVANCES IN RECENT TECHNOLOGIES IN COMMUNICATION AND COMPUTING (ARTCOM 2009), 2009, :58-62
[36]   A Swarm Intelligence Based Memetic Algorithm for Task Allocation in Distributed Systems [J].
Sarvizadeh, Raheleh ;
Kashani, Mostafa Haghi .
FOURTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2011): MACHINE VISION, IMAGE PROCESSING, AND PATTERN ANALYSIS, 2012, 8349
[37]   Intelligent Fuzzy based Biasing Load Balancing Algorithm in Distributed Systems [J].
Barazandeh, Iman ;
Mortazavi, Seyed Saeedolah ;
Rahmani, Amir Masoud .
2009 IEEE 9TH MALAYSIA INTERNATIONAL CONFERENCE ON COMMUNICATIONS (MICC), 2009, :713-718
[38]   A Fault-Tolerant Scheduling Algorithm Based on Checkpointing and Redundancy for Distributed Real-Time Systems [J].
Kada, Barkahoum ;
Kalla, Hamoudi .
INTERNATIONAL JOURNAL OF DISTRIBUTED SYSTEMS AND TECHNOLOGIES, 2019, 10 (03) :58-75
[39]   Min-heap-based scheduling algorithm: an approximation algorithm for homogeneous and heterogeneous distributed systems [J].
Gabriel, Paulo H. R. ;
Albertini, Marcelo K. ;
Castelo, Antonio ;
de Mello, Rodrigo F. .
INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND DISTRIBUTED SYSTEMS, 2016, 31 (01) :64-84
[40]   A simple token-based algorithm for the mutual exclusion problem in distributed systems [J].
Peyman Neamatollahi ;
Yasser Sedaghat ;
Mahmoud Naghibzadeh .
The Journal of Supercomputing, 2017, 73 :3861-3878