Pinpoint: Problem determination in large, dynamic Internet services

被引:343
作者
Chen, MY [1 ]
Kiciman, E [1 ]
Fratkin, E [1 ]
Fox, A [1 ]
Brewer, E [1 ]
机构
[1] Univ Calif Berkeley, Div Comp Sci, Berkeley, CA 94720 USA
来源
INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, PROCEEDINGS | 2002年
关键词
problem determination; problem diagnosis; root cause analysis; data clustering; data mining algorithms;
D O I
10.1109/DSN.2002.1029005
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Traditional problem determination techniques rely on static dependency models that are difficult to generate accurately in today's large, distributed, and dynamic application environments such as e-commerce systems. In this paper, we present a dynamic analysis methodology that automates problem determination in these environments by 1) coarse-grained tagging of numerous real client requests as they travel through the system and 2) using data mining techniques to correlate the believed failures and successes of these requests to determine which components are most likely to be at fault. To validate our methodology, we have implemented Pinpoint, a framework for root cause analysis on the J2EE platform that requires no knowledge of the application components. Pinpoint consists of three parts: a communications layer that traces client requests, a failure detector that uses traffic-sniffing and middleware instrumentation, and a data analysis engine. We evaluate Pinpoint by injecting faults into various application components and show that Pinpoint identifies the faulty components with high accuracy and produces few false-positives.
引用
收藏
页码:595 / 604
页数:10
相关论文
共 12 条
  • [1] BOULOUTAS AT, 1994, IEEE T COMMUNICATION, V42
  • [2] BROWN A, 2001, 7 IFIP IEEE INT S IN
  • [3] CHOI J, 1999, IEEE INT C COMM VANC
  • [4] GRUSCHKE B, 1998, 5 WORKSH OPENVIEW U
  • [5] Hennessy John L., 2017, Computer Architecture-A Quantitative Approach
  • [6] Jain K, 1988, Algorithms for clustering data
  • [7] Making distributed applications manageable through instrumentation
    Katchabaw, MJ
    Howard, SL
    Lutfiyya, HL
    Marshall, AD
    Bauer, MA
    [J]. JOURNAL OF SYSTEMS AND SOFTWARE, 1999, 45 (02) : 81 - 97
  • [8] OPPENHEIMER D, 2002, SUBMISSION IEEE INTE
  • [9] PATTERSON D, 2002, CSD021175 UC BERK CO
  • [10] Romesburg C. H., 1984, Cluster Analysis for Researchers