LatenSeer: Causal Modeling of End-to-End Latency Distributions by Harnessing Distributed Tracing

被引:4
作者
Zhang, Yazhuo [1 ]
Isaacs, Rebecca [2 ]
Yue, Yao [3 ]
Yang, Juncheng [4 ]
Zhang, Lei [5 ]
Vigfusson, Ymir [1 ,6 ]
机构
[1] Emory Univ, Atlanta, GA 30322 USA
[2] Amazon Web Serv, Seattle, WA USA
[3] IOP Syst, San Francisco, CA USA
[4] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[5] Princeton Univ, Princeton, NJ 08544 USA
[6] Keystrike, Atlanta, GA USA
来源
PROCEEDINGS OF THE 2023 ACM SYMPOSIUM ON CLOUD COMPUTING, SOCC 2023 | 2023年
关键词
microservices; distributed tracing; latency estimation; end-to-end latency; INFERENCE;
D O I
10.1145/3620678.3624787
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
End-to-end latency estimation in web applications is crucial for system operators to foresee the effects of potential changes, helping ensure system stability, optimize cost, and improve user experience. However, estimating latency in microservices-based architectures is challenging due to the complex interactions between hundreds or thousands of loosely coupled microservices. Current approaches either track only latency-critical paths or require laborious bespoke instrumentation, which is unrealistic for end-to-end latency estimation in complex systems. This paper presents LatenSeer, a modeling framework for estimating end-to-end latency distributions in microservicebased web applications. LatenSeer proposes novel data structures to accurately represent causal relationships between services, overcoming the drawbacks of simple dependency representations that fail to capture the complexity of microservices. LatenSeer leverages distributed tracing data to practically and accurately model end-to-end latency at scale. Our evaluation shows that LatenSeer predicts latency within a 5.35% error, outperforming the state-of-the-art that has an error rate of more than 9.5%.
引用
收藏
页码:502 / 519
页数:18
相关论文
共 89 条
[81]  
Warner Alec., 2018, The site reliability engineering workbook chapter: Canarying releases
[82]  
Wu Y, 2019, PROCEEDINGS OF THE 16TH USENIX SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION, P395
[83]  
Yang C.-Q., 1988, 8th International Conference on Distributed Computing Systems (Cat. No.88CH2541-1), P366, DOI 10.1109/DCS.1988.12538
[84]   Efficient Algorithms for Web Services Selection with End-to-End QoS Constraints [J].
Yu, Tao ;
Zhang, Yue ;
Lin, Kwei-Jay .
ACM TRANSACTIONS ON THE WEB, 2007, 1 (01)
[85]  
Zhang L, 2023, PROCEEDINGS OF THE 20TH USENIX SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION, NSDI 2023, P321
[86]   WSPred: A Time-Aware Personalized QoS Prediction Framework for Web Services [J].
Zhang, Yilei ;
Zheng, Zibin ;
Lyu, Michael R. .
22ND IEEE INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE), 2011, :210-219
[87]  
Zhang ZZ, 2022, PROCEEDINGS OF THE 2022 USENIX ANNUAL TECHNICAL CONFERENCE, P655
[88]   Overload Control for Scaling WeChat Microservices [J].
Zhou, Hao ;
Chen, Ming ;
Lin, Qian ;
Wang, Yong ;
She, Xiaobin ;
Liu, Sifan ;
Gu, Rui ;
Ooi, Beng Chin ;
Yang, Junfeng .
PROCEEDINGS OF THE 2018 ACM SYMPOSIUM ON CLOUD COMPUTING (SOCC '18), 2018, :149-161
[89]  
Zipkin, 2023, Zipkin: A Distributed Tracing System