Scalable fault-tolerant aggregation in large process groups

被引:49
作者
Gupta, I [1 ]
van Renesse, R [1 ]
Birman, KP [1 ]
机构
[1] Cornell Univ, Dept Comp Sci, Ithaca, NY 14853 USA
来源
INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, PROCEEDINGS | 2001年
关键词
D O I
10.1109/DSN.2001.941427
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper discusses fault-tolerant, scalable solutions to the problem of accurately and scalably calculating global aggregate functions in large process groups communicating over unreliable networks. These groups could represent sensors or processes communicating over a network that is either fixed (eg., the Internet) or dynamic (eg., multihop ad-hoc). Group members are prone to failures. The ability to evaluate global aggregate properties (eg., the average of sensor temperature readings) is important for higher-level coordination activities in such large groups. We first define the setting and problem, laying down metrics to evaluate different algorithms for the same. We discuss why the usual approaches to solve this problem are unviable and unscalable over an unreliable network prone to message delivery failures and crash failures. We then propose a technique to impose an abstract hierarchy, on such large groups, describing how this hierarchy can be made to mirror the network topology. We discuss several alternatives to use this technique to solve the global aggregate function evaluation problem. Finally, we present a protocol based on gossiping that uses this hierarchical technique. We present mathematical analysis and performance results to validate the robustness, efficiency and accuracy of the Hierarchical Gossiping algorithm.
引用
收藏
页码:433 / 442
页数:4
相关论文
共 16 条
[1]  
[Anonymous], P 1997 ACM SIGMOD IN
[2]  
BAILEY N, 1975, EPIDEMIC THEORY INFE
[3]  
BARJOSEPH Z, 1998, P 17 ACM S PRINC DIS, P193, DOI DOI 10.1145/277697.277733
[4]   Bimodal multicast [J].
Birman, KP ;
Hayden, M ;
Ozkasap, O ;
Xiao, Z ;
Budiu, M ;
Minsky, Y .
ACM TRANSACTIONS ON COMPUTER SYSTEMS, 1999, 17 (02) :41-88
[5]  
COORE RWD, 1997, 1614 AI MASS I TECHN
[6]  
DEMERS A, 2000, COMMUNICATION
[7]  
ESTRIN D, 2000, P 6 INT C MOB COMP N, P263
[8]   IMPOSSIBILITY OF DISTRIBUTED CONSENSUS WITH ONE FAULTY PROCESS [J].
FISCHER, MJ ;
LYNCH, NA ;
PATERSON, MS .
JOURNAL OF THE ACM, 1985, 32 (02) :374-382
[9]   Methods for observing global properties in distributed systems [J].
Garg, VK .
IEEE CONCURRENCY, 1997, 5 (04) :69-+
[10]  
KAHN JM, 2000, P 5 INT C MOB COMP N, P271