Efficient recovery from communication errors in distributed shared memory systems

被引:0
|
作者
Lin, JW [1 ]
Kuo, SY [1 ]
机构
[1] Natl Taiwan Univ, Dept Elect Engn, Taipei 10764, Taiwan
来源
关键词
communication errors; distributed shared memory systems; damage; loss; retransmission latency;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper investigates the problem of communication errors in distributed shared memory (DSM) systems. Communication errors can introduce two critical problems: damage and loss. The damage problem makes the transmitted data destroyed and then produces incorrect computational results. The loss problem causes the transmitted data lost during transmission and then not received. However, the loss problem can be easily resolved using acknowledgement. Therefore, we focus on how to efficiently handle the damage problem. In DSM systems, the size of data transferred between nodes is larger than the size actually shared between nodes. That is, when a processing node receives data, not all the data items in this received data will be used. Based on this property, we present a new technique to resolve the data damage problem in DSM systems. This technique allows a processing node to continue computation without being blocked to wail for the correct data when it receives damaged data. Therefore, the latency for handling the data damage can be hidden. However, there is an optimistic assumption made in the proposed technique. If this optimistic assumption is not valid, the latency will not be hidden. To show the advantage and the overhead of the proposed technique, we perform extensive trace-driven simulations. The simulation results show that at least 62% of the latency for handling data damage can be hidden.
引用
收藏
页码:1213 / 1223
页数:11
相关论文
共 50 条
  • [31] Using distributed-shared memory mechanisms for agents communication in a distributed system
    Gonzaga, Thiago
    Bentes, Cristiana
    Farias, Ricardo
    De Castro, Maria Clicia S.
    Garcia, Ana Cristina B.
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, 2007, : 39 - +
  • [32] Producer-consumer communication in distributed shared memory multiprocessors
    Byrd, GT
    Flynn, MJ
    PROCEEDINGS OF THE IEEE, 1999, 87 (03) : 456 - 466
  • [33] An efficient recovery technique for distributed systems
    Gupta, B
    Mogharreban, N
    Zhang, X
    PDPTA'2001: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, 2001, : 55 - 61
  • [34] An efficient communication protocol for distributed systems
    Manzoni, P
    INTERNATIONAL SOCIETY FOR COMPUTERS AND THEIR APPLICATIONS 10TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING SYSTEMS, 1997, : 328 - 331
  • [35] A survey of recoverable distributed shared virtual memory systems
    Morin, C
    Puaut, I
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1997, 8 (09) : 959 - 969
  • [36] Distributed parallel volume rendering on shared memory systems
    Hancock, D.J.
    Hubbold, R.J.
    Future Generation Computer Systems, 1998, 13 (4-5): : 251 - 259
  • [37] Lazy some migration for distributed shared memory systems
    Baylor, S
    Ekanadham, K
    Jann, J
    Lim, BH
    Pattnaik, P
    FOURTH INTERNATIONAL CONFERENCE ON HIGH-PERFORMANCE COMPUTING, PROCEEDINGS, 1997, : 106 - 111
  • [38] DISTRIBUTED SHARED-MEMORY IMPLEMENTATION FOR MULTITRANSPUTER SYSTEMS
    TSANAKAS, P
    PAPAKONSTANTINOU, G
    EFTHIVOULIDIS, G
    INFORMATION AND SOFTWARE TECHNOLOGY, 1992, 34 (08) : 499 - 506
  • [39] Impacts of Topology and Bandwidth on Distributed Shared Memory Systems
    Milton, Jonathan
    Zarkesh-Ha, Payman
    COMPUTERS, 2023, 12 (04)
  • [40] Distributed parallel volume rendering on shared memory systems
    Hancock, DJ
    Hubbold, RJ
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 1998, 13 (4-5): : 251 - 259