Lightweight Consistent Recovery Algorithm for Sender-Based Message Logging in Distributed Systems

被引:1
作者
Ahn, Jinho [1 ]
机构
[1] Kyonggi Univ, Dept Comp Sci, Suwon, Gyeonggi Do, South Korea
关键词
distributed systems; fault-tolerance; message logging; checkpointing; scalability; consistent recovery;
D O I
10.1587/transinf.E94.D.1712
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sender-based message logging (SBML) with checkpointing has its well-known beneficial feature, lowering highly failure-free overhead of synchronous logging with volatile logging at sender's memory. This feature encourages it to be applied into many distributed systems as a low-cost transparent rollback recovery technique. However, the original SBML recovery algorithm may no longer be progressing in some transient communication error cases. This paper proposes a consistent recovery algorithm to solve this problem by piggybacking small log information for unstable messages received on each acknowledgement message for returning the receive sequence number assigned to a message by its receiver. Our algorithm also enables all messages scheduled to be sent, but delayed because of some preceding unstable messages to be actually transmitted out much earlier than the existing ones.
引用
收藏
页码:1712 / 1715
页数:4
相关论文
共 50 条
[21]   A low overhead logging scheme for fast recovery in distributed shared memory systems [J].
Park, T ;
Yeom, HY .
JOURNAL OF SUPERCOMPUTING, 2000, 15 (03) :295-320
[22]   A Low Overhead Logging Scheme for Fast Recovery in Distributed Shared Memory Systems [J].
Taesoon Park ;
Heon Y. Yeom .
The Journal of Supercomputing, 2000, 15 :295-320
[23]   Scalable Checkpointing-based Rollback Recovery Protocol For Geographically Distributed Systems [J].
Ahn, Jinho .
INFORMATION TECHNOLOGY APPLICATIONS IN INDUSTRY, PTS 1-4, 2013, 263-266 :1492-1496
[24]   An index-based checkpointing algorithm for autonomous distributed systems [J].
Baldoni, R ;
Quaglia, F ;
Fornara, P .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1999, 10 (02) :181-192
[25]   A distributed leader election algorithm in crash-recovery and omissive systems [J].
Fernandez-Campusano, Christian ;
Larrea, Mikel ;
Cortinas, Roberto ;
Raynal, Michel .
INFORMATION PROCESSING LETTERS, 2017, 118 :100-104
[26]   HOPE: A Hybrid Optimistic checkpointing and selective Pessimistic mEssage logging protocol for large scale distributed systems [J].
Luo, Yi ;
Manivannan, D. .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2012, 28 (08) :1217-1235
[27]   A heterogeneous checkpoint and recovery protocol in cluster-based distributed systems [J].
Paul, HS ;
Gupta, A ;
Badrinath, R .
PDPTA'03: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS 1-4, 2003, :1224-1230
[28]   A timeout-based message ordering protocol for a lightweight software implementation of TMR systems [J].
Ezhilchelvan, PD ;
Brasileiro, FV ;
Speirs, NA .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2004, 15 (01) :53-65
[29]   A TOKEN BASED K-RESILIENT MUTUAL EXCLUSION ALGORITHM FOR DISTRIBUTED SYSTEMS [J].
DHAMDHERE, DM ;
KULKARNI, SS .
INFORMATION PROCESSING LETTERS, 1994, 50 (03) :151-157
[30]   A Hierarchical Adaptive Leader Election Algorithm for Crash-Recovery Distributed Systems [J].
Rodrigues, Luiz Antonio ;
Silva Freitas, Allan Edgard ;
Duarte, Elias Procopio, Jr. ;
Fulber-Garcia, Vinicius .
13TH LATIN-AMERICAN SYMPOSIUM ON DEPENDABLE AND SECURE COMPUTING, LADC 2024, 2024, :136-145