Weighted Sampling of Execution Traces: Capturing More Needles and Less Hay

被引:23
作者
Las-Casas, Pedro [1 ]
Mace, Jonathan [2 ]
Guedes, Dorgival [1 ]
Fonseca, Rodrigo [3 ]
机构
[1] Univ Fed Minas Gerais, Belo Horizonte, MG, Brazil
[2] MPI SWS, Saarbrucken, Germany
[3] Brown Univ, Providence, RI 02912 USA
来源
PROCEEDINGS OF THE 2018 ACM SYMPOSIUM ON CLOUD COMPUTING (SOCC '18) | 2018年
关键词
distributed tracing; weighted sampling;
D O I
10.1145/3267809.3267841
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
End-to-end tracing has emerged recently as a valuable tool to improve the dependability of distributed systems, by performing dynamic verification and diagnosing correctness and performance problems. Contrary to logging, end-to-end traces enable coherent sampling of the entire execution of specific requests, and this is exploited by many deployments to reduce the overhead and storage requirements of tracing. This sampling, however, is usually done uniformly at random, which dedicates a large fraction of the sampling budget to common, 'normal' executions, while missing infrequent, but sometimes important, erroneous or anomalous executions. In this paper we define the representative trace sampling problem, and present a new approach, based on clustering of execution graphs, that is able to bias the sampling of requests to maximize the diversity of execution traces stored towards infrequent patterns. In a preliminary, but encouraging work, we show how our approach chooses to persist representative and diverse executions, even when anomalous ones are very infrequent.
引用
收藏
页码:326 / 332
页数:7
相关论文
共 22 条
[1]  
[Anonymous], 2008, THESIS
[2]  
[Anonymous], S NETW SYST DES IMPL
[3]  
Barham P., 2003, P 9 C HOT TOP OP SYS, V9, P15
[4]  
Bertsekas D. P., 1992, Data Networks, V2nd
[5]  
Chanda A., 2007, Operating Systems Review, V41, P17, DOI 10.1145/1272998.1273001
[6]   On the optimality of max-min fairness in resource allocation [J].
Coluccia, Angelo ;
D'Alconzo, Alessandro ;
Ricciato, Fabio .
ANNALS OF TELECOMMUNICATIONS, 2012, 67 (1-2) :15-26
[7]  
Fonseca R., 2007, Proc. NSDI, P20
[8]  
Kaldor Jonathan, 2017, 26 ACM S OP SYST PRI
[9]   A Hierarchical Algorithm for Extreme Clustering [J].
Kobren, Ari ;
Monath, Nicholas ;
Krishnamurthy, Akshay ;
McCallum, Andrew .
KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, :255-264
[10]   TIME, CLOCKS, AND ORDERING OF EVENTS IN A DISTRIBUTED SYSTEM [J].
LAMPORT, L .
COMMUNICATIONS OF THE ACM, 1978, 21 (07) :558-565