Horus: Non-Intrusive Causal Analysis of Distributed Systems Logs

被引:1
作者
Neves, Francisco [1 ,2 ]
Machado, Nuno [1 ,3 ]
Vilaca, Ricardo [1 ,2 ]
Pereira, Jose [1 ,2 ]
机构
[1] INESC TEC, Braga, Portugal
[2] U Minho, Braga, Portugal
[3] Amazon, Madrid, Spain
来源
51ST ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN 2021) | 2021年
关键词
D O I
10.1109/DSN48987.2021.00035
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Logs are still the primary resource for debugging distributed systems executions. Complexity and heterogeneity of modern distributed systems, however, make log analysis extremely challenging. First, due to the sheer amount of messages, in which the execution paths of distinct system components appear interleaved. Second, due to unsynchronized physical clocks, simply ordering the log messages by timestamp does not suffice to obtain a causal trace of the execution. To address these issues, we present Horus, a system that enables the refinement of distributed system logs in a causally-consistent and scalable fashion. Horus leverages kernel-level probing to capture events for tracking causality between application-level logs from multiple sources. The events are then encoded as a directed acyclic graph and stored in a graph database, thus allowing the use of rich query languages to reason about runtime behavior. Our case study with TrainTicket, a ticket booking application with 40+ microservices, shows that Horus surpasses current widely-adopted log analysis systems in pinpointing the root cause of anomalies in distributed executions. Also, we show that Horus builds a causally-consistent log of a distributed execution with much higher performance (up to 3 orders of magnitude) and scalability than prior state-of-the-art solutions. Finally, we show that Horus' approach to query causality is up to 30 times faster than graph database built-in traversal algorithms.
引用
收藏
页码:212 / 223
页数:12
相关论文
共 50 条
[41]   Efficient non-intrusive divergence detection techniques in an in-service non-intrusive measurement device [J].
Ng, WP ;
Elmirghani, JMH ;
Broom, S .
ELECTRONICS LETTERS, 2000, 36 (23) :1980-1981
[42]   Low-cost non-intrusive debugging strategies for distributed parallel programs [J].
Beynon, MD ;
Andrade, H ;
Saltz, J .
2002 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, PROCEEDINGS, 2002, :439-442
[43]   Estimation of Spatially Distributed Thermal Properties of Heterogeneous Media with Non-Intrusive Measurement [J].
Somasundharam, Sankaran ;
Reddy, Kalvala Srinivas .
HEAT TRANSFER ENGINEERING, 2021, 42 (01) :61-87
[44]   Non-Intrusive Distributed Tracing of Wireless IoT Devices with the FlockLab 2 Testbed [J].
Trub, Roman ;
Da Forno, Reto ;
Daschinger, Lukas ;
Biri, Andreas ;
Beutel, Jan ;
Thiele, Lothar .
ACM TRANSACTIONS ON INTERNET OF THINGS, 2022, 3 (01)
[45]   A posteriori error analysis and adaptive non-intrusive numerical schemes for systems of random conservation laws [J].
Jan Giesselmann ;
Fabian Meyer ;
Christian Rohde .
BIT Numerical Mathematics, 2020, 60 :619-649
[46]   Entropy Application in Partial Discharge Analysis with Non-intrusive Measurement [J].
Luo, Guomin ;
Zhang, Daming .
PROCEEDINGS OF THE 2011 2ND INTERNATIONAL CONGRESS ON COMPUTER APPLICATIONS AND COMPUTATIONAL SCIENCE, VOL 2, 2012, 145 :319-324
[47]   Non-Intrusive Load Monitoring Using Orthogonal Wavelet Analysis [J].
Gillis, Jessie ;
Morsi, Walid G. .
2016 IEEE CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2016,
[48]   Non-intrusive aerodynamic loads analysis of an aircraft propeller blade [J].
D. Ragni ;
B. W. van Oudheusden ;
F. Scarano .
Experiments in Fluids, 2011, 51 :361-371
[49]   Non-Intrusive Signal Analysis for Room Adaptation of ASR Models [J].
Li, Ge ;
Sharma, Dushyant ;
Naylor, Patrick A. .
2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, :130-134
[50]   Non-Intrusive Hardware Acceleration for Dynamic Binary Translation in Embedded Systems [J].
Gomes, Tiago ;
Salgado, Filipe ;
Cabral, Jorge ;
Tavares, Adriano ;
Monteiro, Jodo .
2019 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY (ICIT), 2019, :1800-1805