Smart Redundancy for Distributed Computation

被引:26
作者
Brun, Yuriy [1 ]
Edwards, George [2 ]
Bang, Jae Young [3 ]
Medvidovic, Nenad [3 ]
机构
[1] Univ Washington, Seattle, WA 98195 USA
[2] Blue Cell Software, Los Angeles, CA USA
[3] Univ Southern Calif, Los Angeles, CA 90089 USA
来源
31ST INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2011) | 2011年
基金
美国国家科学基金会;
关键词
TOLERANCE;
D O I
10.1109/ICDCS.2011.25
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Many distributed software systems allow participation by large numbers of untrusted, potentially faulty components on an open network. As faults are inevitable in this setting, these systems utilize redundancy and replication to achieve fault tolerance. In this paper, we present a novel "smart" redundancy technique called iterative redundancy, which ensures efficient replication of computation and data given finite processing and storage resources, even when facing Byzantine faults. Iterative redundancy is more efficient and more adaptive than comparable state-of-the-art techniques that operate in environments with unknown system resource reliability. We show how systems that solve computational problems using a network of independent nodes can benefit from iterative redundancy. We present a formal analytical analysis and an empirical analysis, demonstrate iterative redundancy on a real-world volunteer-computing system, and compare it to existing methods.
引用
收藏
页码:665 / 676
页数:12
相关论文
共 28 条
[21]   The LHC computing grid project at CERN [J].
Lamanna, M .
NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH SECTION A-ACCELERATORS SPECTROMETERS DETECTORS AND ASSOCIATED EQUIPMENT, 2004, 534 (1-2) :1-6
[22]   THE BYZANTINE GENERALS PROBLEM [J].
LAMPORT, L ;
SHOSTAK, R ;
PEASE, M .
ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS, 1982, 4 (03) :382-401
[23]   A blueprint for introducing disruptive technology into the Internet [J].
Peterson, L ;
Anderson, T ;
Culler, D ;
Roscoe, T .
ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2003, 33 (01) :59-64
[24]   Fault tolerance-genetic algorithm for grid task scheduling using check point [J].
Priya, S. Baghavathi ;
Prakash, M. ;
Dhawan, K. K. .
SIXTH INTERNATIONAL CONFERENCE ON GRID AND COOPERATIVE COMPUTING, PROCEEDINGS, 2007, :676-+
[25]   Sabotage-tolerance mechanisms for volunteer computing systems [J].
Sarmenta, LFG .
FUTURE GENERATION COMPUTER SYSTEMS, 2002, 18 (04) :561-572
[26]  
SCHNEIDER FB, 1990, COMPUT SURV, V22, P299, DOI 10.1145/98163.98167
[27]  
Setiawan A, 2004, P 5 INT C PAR DISTR
[28]  
Sipser M., 2012, Introduction to the theory of computation.