TraceCRL: Contrastive Representation Learning for Microservice Trace Analysis

被引:13
作者
Zhang, Chenxi [1 ,2 ,3 ]
Peng, Xin [1 ,2 ,3 ]
Zhou, Tong [1 ,2 ,3 ]
Sha, Chaofeng [1 ,2 ,3 ]
Yan, Zhenghui [1 ,2 ,3 ]
Chen, Yiru [1 ,2 ,3 ]
Yang, Hong [1 ,2 ,3 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China
[2] Shanghai Key Lab Data Sci, Shanghai, Peoples R China
[3] Shanghai Collaborat Innovat Ctr Intelligent Visua, Shanghai, Peoples R China
来源
PROCEEDINGS OF THE 30TH ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2022 | 2022年
关键词
Microservice; Tracing; Graph Neural Network; Deep Learning; Contrastive Learning;
D O I
10.1145/3540250.3549146
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Due to the large amount and high complexity of trace data, microservice trace analysis tasks such as anomaly detection, fault diagnosis, and tail-based sampling widely adopt machine learning technology. These trace analysis approaches usually use a preprocessing step to map structured features of traces to vector representations in an adhoc way. Therefore, they may lose important information such as topological dependencies between service operations. In this paper, we propose TraceCRL, a trace representation learning approach based on contrastive learning and graph neural network, which can incorporate graph structured information in the downstream trace analysis tasks. Given a trace, TraceCRL constructs an operation invocation graph where nodes represent service operations and edges represent operation invocations together with predefined features for invocation status and related metrics. Based on the operation invocation graphs of traces TraceCRL uses a contrastive learning method to train a graph neural network-based model for trace representation. In particular, TraceCRL employs six trace data augmentation strategies to alleviate the problems of class collision and uniformity of representation in contrastive learning. Our experimental studies show that TraceCRL can significantly improve the performance of trace anomaly detection and offline trace sampling. It also confirms the effectiveness of the trace augmentation strategies and the efficiency of TraceCRL.
引用
收藏
页码:1221 / 1232
页数:12
相关论文
共 41 条
[1]  
Arora S, 2019, PR MACH LEARN RES, V97
[2]   Automated Analysis of Distributed Tracing: Challenges and Research Directions [J].
Bento, Andre ;
Correia, Jaime ;
Filipe, Ricardo ;
Araujo, Filipe ;
Cardoso, Jorge .
JOURNAL OF GRID COMPUTING, 2021, 19 (01)
[3]  
Chaos Mesh, 2022, Chaos mesh.
[4]  
Chen T, 2020, PR MACH LEARN RES, V119
[5]  
elastic, 2022, elasticsearch
[6]   node2vec: Scalable Feature Learning for Networks [J].
Grover, Aditya ;
Leskovec, Jure .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :855-864
[7]   Graph-Based Trace Analysis for Microservice Architecture Understanding and Problem Diagnosis [J].
Guo, Xiaofeng ;
Peng, Xin ;
Wang, Hanzhang ;
Li, Wanxue ;
Jiang, Huai ;
Ding, Dan ;
Xie, Tao ;
Su, Liangfei .
PROCEEDINGS OF THE 28TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '20), 2020, :1387-1397
[8]   Sieve: Attention-based Sampling of End-to-End Trace Data in Distributed Microservice Systems [J].
Huang, Zicheng ;
Chen, Pengfei ;
Yu, Guangba ;
Chen, Hongyang ;
Zheng, Zibin .
2021 IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES, ICWS 2021, 2021, :436-446
[9]  
Jaegertracing.io, 2022, Jaeger
[10]   Canopy: An End-to-End Performance Tracing And Analysis System [J].
Kaldor, Jonathan ;
Mace, Jonathan ;
Bejda, Michal ;
Gao, Edison ;
Kuropatwa, Wiktor ;
O'Neill, Joe ;
Ong, Kian Win ;
Schaller, Bill ;
Shan, Pingjia ;
Viscomi, Brendan ;
Venkataraman, Vinod ;
Veeraraghavan, Kaushik ;
Song, Yee Jiun .
PROCEEDINGS OF THE TWENTY-SIXTH ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES (SOSP '17), 2017, :34-50