Towards highly reliable enterprise network services via inference of multi-level dependencies

被引:118
作者
Bahl, Paramvir [1 ]
Chandra, Ranveer [1 ]
Greenberg, Albert [1 ]
Kandula, Srikanth [1 ]
Maltz, David A. [1 ]
Zhang, Ming [1 ]
机构
[1] Microsoft Corp, Res, Redmond, WA 98052 USA
关键词
management; network & service management; dependencies; fault localization; probabilistic inference;
D O I
10.1145/1282427.1282383
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Localizing the sources of performance problems in large enterprise networks is extremely challenging. Dependencies are numerous, complex and inherently multi-level, spanning hardware and software components across the network and the computing infrastructure. To exploit these dependencies for fast, accurate problem localization, we introduce an Inference Graph model, which is well-adapted to user-perceptible problems rooted in conditions giving rise to both partial service degradation and hard faults. Further, we introduce the Sherlock system to discover Inference Graphs in the operational enterprise, infer critical attributes, and then leverage the result to automatically detect and localize problems. To illuminate strengths and limitations of the approach, we provide results from a prototype deployment in a large enterprise network, as well as from testbed emulations and simulations. In particular, we find that taking into account multi-level structure leads to a 30% improvement in fault localization, as compared to two-level approaches.
引用
收藏
页码:13 / 24
页数:12
相关论文
共 20 条
[1]  
AGUILERA MK, 2003, SOSP OCT
[2]  
AIELLO W, 2005, PAM MAR
[3]  
[Anonymous], MICROSOFT OPERATIONS
[4]  
[Anonymous], HP OPENVIEW
[5]  
BARHAM P, 2004, OSDI DEC
[6]  
CHEN MY, 2004, NSDI 04
[7]  
DUNAGAN J, 2004, OSDI
[8]  
*IBM, IBM TIV
[9]  
Kandula S., 2005, P MINENET WORKSH SIG
[10]  
Kompella R. R., 2005, P NSDI MAY