Efficient recovery from communication errors in distributed shared memory systems

被引:0
|
作者
Lin, JW [1 ]
Kuo, SY [1 ]
机构
[1] Natl Taiwan Univ, Dept Elect Engn, Taipei 10764, Taiwan
来源
关键词
communication errors; distributed shared memory systems; damage; loss; retransmission latency;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper investigates the problem of communication errors in distributed shared memory (DSM) systems. Communication errors can introduce two critical problems: damage and loss. The damage problem makes the transmitted data destroyed and then produces incorrect computational results. The loss problem causes the transmitted data lost during transmission and then not received. However, the loss problem can be easily resolved using acknowledgement. Therefore, we focus on how to efficiently handle the damage problem. In DSM systems, the size of data transferred between nodes is larger than the size actually shared between nodes. That is, when a processing node receives data, not all the data items in this received data will be used. Based on this property, we present a new technique to resolve the data damage problem in DSM systems. This technique allows a processing node to continue computation without being blocked to wail for the correct data when it receives damaged data. Therefore, the latency for handling the data damage can be hidden. However, there is an optimistic assumption made in the proposed technique. If this optimistic assumption is not valid, the latency will not be hidden. To show the advantage and the overhead of the proposed technique, we perform extensive trace-driven simulations. The simulation results show that at least 62% of the latency for handling data damage can be hidden.
引用
收藏
页码:1213 / 1223
页数:11
相关论文
共 50 条
  • [21] Memory-Based Communication Facilities and asymmetric Distributed Shared Memory
    Matsumoto, T
    Hiraki, K
    INNOVATIVE ARCHITECTURE FOR FUTURE GENERATION HIGH-PERFORMANCE PROCESSORS AND SYSTEMS, PROCEEDINGS, 1998, : 30 - 39
  • [22] Special issue on distributed shared memory systems
    Milutinovic, V
    Stenström, P
    PROCEEDINGS OF THE IEEE, 1999, 87 (03) : 399 - 404
  • [23] Load balancing in distributed shared memory systems
    Lai, AC
    Shieh, CK
    Kok, YT
    Ueng, JC
    Kung, LY
    1977 IEEE INTERNATIONAL PERFORMANCE, COMPUTING AND COMMUNICATIONS CONFERENCE, 1997, : 152 - 158
  • [24] Is It Time To Rethink Distributed Shared Memory Systems?
    Ramesh, Bharath
    Ribbens, Calvin J.
    Varadarajan, Srinidhi
    2011 IEEE 17TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2011, : 212 - 219
  • [25] Reducing overheads in distributed shared memory systems
    Morris, J
    Gregg, RR
    Herbert, D
    McCoull, J
    THIRTIETH HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES, VOL 1: SOFTWARE TECHNOLOGY AND ARCHITECTURE, 1997, : 244 - 252
  • [26] PANEL - PARALLEL AND DISTRIBUTED COMPUTING DISTRIBUTED MEMORY OR SHARED MEMORY-SYSTEMS
    REIJNS, GL
    IFIP TRANSACTIONS A-COMPUTER SCIENCE AND TECHNOLOGY, 1992, 12 : 543 - 544
  • [27] Efficient algorithms for prefix and general prefix computations on distributed shared memory systems with applications
    Kamakoti, V
    Balakrishnan, N
    1997 INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS, PROCEEDINGS, 1997, : 44 - 51
  • [28] Making distributed shared memory simple, yet efficient
    Swanson, M
    Stoller, L
    Carter, J
    THIRD INTERNATIONAL WORKSHOP ON HIGH-LEVEL PARALLEL PROGRAMMING MODELS AND SUPPORTIVE ENVIRONMENTS, PROCEEDINGS, 1998, : 2 - 13
  • [29] Lazy logging and prefetch-based crash recovery in software distributed shared memory systems
    Kongmunvattana, A
    Tzeng, NF
    IPPS/SPDP 1999: 13TH INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM & 10TH SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING, PROCEEDINGS, 1999, : 399 - 406
  • [30] Communication in Shared Memory: Concepts, Definitions, and Efficient Detection
    Diener, Matthias
    Cruz, Eduardo H. M.
    Alves, Marco A. Z.
    Navaux, Philippe O. A.
    2016 24TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP), 2016, : 151 - 158