Recent Advances in Fault Localization in Computer Networks

被引:49
作者
Dusia, Ayush [1 ]
Sethi, Adarshpal S. [1 ]
机构
[1] Univ Delaware, Dept Comp & Informat Sci, Newark, DE 19716 USA
关键词
Fault localization; fault detection; network management; survey; fault diagnosis; passive monitoring; active monitoring; overlay and virtual networks; COMMUNICATION-SYSTEMS; DIAGNOSIS; IDENTIFICATION; ALGORITHMS;
D O I
10.1109/COMST.2016.2570599
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Fault localization, a core element in network fault management, is the process of inferring the exact failure in a network from the set of observed symptoms. Since faults in network systems can be unavoidable, their quick and accurate detection and diagnosis is important for the stability, consistency, and performance of a communication system. In this paper, we discuss the challenges of fault localization in complex communication systems and present an overview of recent techniques proposed in the literature along with their advantages and limitations. We start by briefly surveying passive monitoring techniques which were previously reviewed in a survey by Steinder. We then describe more recent fault localization research in five categories: 1) active monitoring techniques; 2) techniques for overlay and virtual networks; 3) decentralized probabilistic management techniques; 4) temporal correlation techniques; and 5) learning techniques.
引用
收藏
页码:3030 / 3051
页数:22
相关论文
共 91 条
[21]  
Cerquides J., 2003, procedings of the Twentieth International Conference on Machine Learning, P75
[22]  
Chen L., 2010, IEEE INT S PAR DISTR, P1, DOI DOI 10.1109/IPDPS.2010.5470413
[23]   Failure diagnosis using decision trees [J].
Chen, M ;
Zheng, AX ;
Lloyd, J ;
Jordan, MI ;
Brewer, E .
INTERNATIONAL CONFERENCE ON AUTONOMIC COMPUTING, PROCEEDINGS, 2004, :36-43
[24]   Pinpoint: Problem determination in large, dynamic Internet services [J].
Chen, MY ;
Kiciman, E ;
Fratkin, E ;
Fox, A ;
Brewer, E .
INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, PROCEEDINGS, 2002, :595-604
[25]   Algebra-based scalable overlay network monitoring: Algorithms, evaluation and applications [J].
Chen, Yan ;
Bindel, David ;
Song, Han Hee ;
Katz, Randy H. .
IEEE-ACM TRANSACTIONS ON NETWORKING, 2007, 15 (05) :1084-1097
[26]  
Cohen I, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P231
[27]   "Not All At Once!" - A Generic Scheme for Estimating the Number of Affected Nodes While Avoiding Feedback Implosion [J].
Cohen, Reuven ;
Landau, Alexander .
IEEE INFOCOM 2009 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS, VOLS 1-5, 2009, :2641-2645
[28]   Bucket elimination: A unifying framework for reasoning [J].
Dechter, R .
ARTIFICIAL INTELLIGENCE, 1999, 113 (1-2) :41-85
[29]  
Demirci M, 2013, IEEE GLOB COMM CONF, P2236, DOI 10.1109/GLOCOM.2013.6831407
[30]  
Demirci M, 2009, LECT NOTES COMPUT SC, V5448, P77, DOI 10.1007/978-3-642-00975-4_8