Efficient recovery from communication errors in distributed shared memory systems

被引:0
|
作者
Lin, JW [1 ]
Kuo, SY [1 ]
机构
[1] Natl Taiwan Univ, Dept Elect Engn, Taipei 10764, Taiwan
来源
关键词
communication errors; distributed shared memory systems; damage; loss; retransmission latency;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper investigates the problem of communication errors in distributed shared memory (DSM) systems. Communication errors can introduce two critical problems: damage and loss. The damage problem makes the transmitted data destroyed and then produces incorrect computational results. The loss problem causes the transmitted data lost during transmission and then not received. However, the loss problem can be easily resolved using acknowledgement. Therefore, we focus on how to efficiently handle the damage problem. In DSM systems, the size of data transferred between nodes is larger than the size actually shared between nodes. That is, when a processing node receives data, not all the data items in this received data will be used. Based on this property, we present a new technique to resolve the data damage problem in DSM systems. This technique allows a processing node to continue computation without being blocked to wail for the correct data when it receives damaged data. Therefore, the latency for handling the data damage can be hidden. However, there is an optimistic assumption made in the proposed technique. If this optimistic assumption is not valid, the latency will not be hidden. To show the advantage and the overhead of the proposed technique, we perform extensive trace-driven simulations. The simulation results show that at least 62% of the latency for handling data damage can be hidden.
引用
收藏
页码:1213 / 1223
页数:11
相关论文
共 50 条
  • [1] Efficient recovery from communication errors in distributed shared memory systems
    Natl Taiwan Univ, Taipei, Taiwan
    IEICE Trans Inf Syst, 11 (1213-1223):
  • [2] Fault recovery for distributed shared memory systems
    Dieter, WR
    Lumpp, JE
    1997 IEEE AEROSPACE CONFERENCE PROCEEDINGS, VOL 2, 1997, : 525 - 540
  • [3] An efficient logging and recovery scheme for lazy release consistent distributed shared memory systems
    Park, T
    Yeom, HY
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2000, 17 (03): : 265 - 278
  • [4] An efficient logging scheme for recoverable distributed shared memory systems
    Park, T
    Cho, S
    Yeom, HY
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, 1997, : 305 - 313
  • [5] Analysis of failure recovery schemes for distributed shared-memory systems
    Kim, JH
    Vaidya, NH
    IEE PROCEEDINGS-COMPUTERS AND DIGITAL TECHNIQUES, 1999, 146 (03): : 125 - 130
  • [6] ENSURING CORRECT ROLLBACK RECOVERY IN DISTRIBUTED SHARED-MEMORY SYSTEMS
    JANSSENS, B
    FUCHS, WK
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1995, 29 (02) : 211 - 218
  • [7] An efficient causal logging scheme for recoverable distributed shared memory systems
    Park, T
    Lee, I
    Yeom, HY
    PARALLEL COMPUTING, 2002, 28 (11) : 1549 - 1572
  • [8] A low overhead logging scheme for fast recovery in distributed shared memory systems
    Park, T
    Yeom, HY
    JOURNAL OF SUPERCOMPUTING, 2000, 15 (03): : 295 - 320
  • [9] A Low Overhead Logging Scheme for Fast Recovery in Distributed Shared Memory Systems
    Taesoon Park
    Heon Y. Yeom
    The Journal of Supercomputing, 2000, 15 : 295 - 320
  • [10] Reconfigurable interconnection networks in Distributed Shared Memory systems: a study on communication patterns
    Khoi, Bui Viet
    Tinh, Pham Doan
    Quan, Nguyen Nam
    Artudo, Inigo
    Manjarres, Daniel
    Heirman, Wim
    Debaes, Christof
    Dambre, Joni
    Van Campenhout, Jan
    Thienpont, Hugo
    2006 FIRST INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND ELECTRONICS, 2006, : 343 - +