Improving both fault tolerance and performance by replicating control data in scope consistent software DSMs

被引:0
作者
de Melo, ACMA [1 ]
da Silva, LN [1 ]
机构
[1] Univ Brasilia, Dept Comp Sci, Brasilia, DF, Brazil
来源
PARALLEL AND DISTRIBUTED COMPUTING SYSTEMS | 2002年
关键词
Distributed shared memory; replication mechanisms; checkpointing support;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
To use the shared memory programming paradigm in distributed architectures where there is no physically shared memory, an abstraction must be created. This abstraction is known as Distributed Shared Memory (DSM). Scope consistent software DSMs provide a relaxed memory model that guarantees consistency only at synchronization operations, in a per-lock basis. As the main goal of DSM systems is to provide support for long term computation intensive applications, a fault tolerance support is highly desirable. This article presents and evaluates a mechanism that replicates scope consistent DSM control data in remote nodes, aiming to provide fault tolerance support. Our results on some popular benchmarks show that the overhead introduced by the proposed mechanism is low and, surprisingly, in some cases, applications ran faster with the replication mechanism than without it, since replicas created by the mechanism are guaranteed to be up-to-date and can be used by the application, reducing the internode communication overhead.
引用
收藏
页码:507 / 512
页数:6
相关论文
共 17 条
[1]   Software DSM protocols that adapt between single writer and multiple writer [J].
Amza, C ;
Cox, AL ;
Dwarkadas, S ;
Zwaenepoel, W .
THIRD INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE - PROCEEDINGS, 1997, :261-271
[2]  
BAILEY D, 1993, 103863NASA TR
[3]  
BLUMRICH M, P ISCA 94, P142
[4]  
Culler D., 1998, PARALLEL COMPUTER AR
[5]  
ELNOZAHY M, 1996, CMUCS96181 TR
[6]  
Gharachorloo K., 1990, Proceedings. The 17th Annual International Symposium on Computer Architecture (Cat. No.90CH2887-8), P15, DOI 10.1109/ISCA.1990.134503
[7]  
HU W, P HPCN 99, P463
[8]  
IFTODE L, 1998, THESIS PRINCETON U
[9]  
IFTODE L, P ACM SPAA 96, P277
[10]  
KELEHER P, 1994, PROCEEDINGS OF THE WINTER 1994 USENIX CONFERENCE, P115