LatenSeer: Causal Modeling of End-to-End Latency Distributions by Harnessing Distributed Tracing

被引:4
作者
Zhang, Yazhuo [1 ]
Isaacs, Rebecca [2 ]
Yue, Yao [3 ]
Yang, Juncheng [4 ]
Zhang, Lei [5 ]
Vigfusson, Ymir [1 ,6 ]
机构
[1] Emory Univ, Atlanta, GA 30322 USA
[2] Amazon Web Serv, Seattle, WA USA
[3] IOP Syst, San Francisco, CA USA
[4] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[5] Princeton Univ, Princeton, NJ 08544 USA
[6] Keystrike, Atlanta, GA USA
来源
PROCEEDINGS OF THE 2023 ACM SYMPOSIUM ON CLOUD COMPUTING, SOCC 2023 | 2023年
关键词
microservices; distributed tracing; latency estimation; end-to-end latency; INFERENCE;
D O I
10.1145/3620678.3624787
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
End-to-end latency estimation in web applications is crucial for system operators to foresee the effects of potential changes, helping ensure system stability, optimize cost, and improve user experience. However, estimating latency in microservices-based architectures is challenging due to the complex interactions between hundreds or thousands of loosely coupled microservices. Current approaches either track only latency-critical paths or require laborious bespoke instrumentation, which is unrealistic for end-to-end latency estimation in complex systems. This paper presents LatenSeer, a modeling framework for estimating end-to-end latency distributions in microservicebased web applications. LatenSeer proposes novel data structures to accurately represent causal relationships between services, overcoming the drawbacks of simple dependency representations that fail to capture the complexity of microservices. LatenSeer leverages distributed tracing data to practically and accurately model end-to-end latency at scale. Our evaluation shows that LatenSeer predicts latency within a 5.35% error, outperforming the state-of-the-art that has an error rate of more than 9.5%.
引用
收藏
页码:502 / 519
页数:18
相关论文
共 89 条
[1]  
Akamai Technologies, 2017, Akamai Online Retail Performance Report: Milliseconds Are Critical
[2]  
Alipourfard O, 2017, PROCEEDINGS OF NSDI '17: 14TH USENIX SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION, P469
[3]  
[Anonymous], 2010, WWW 10
[4]  
[Anonymous], 2009, 18 INT C WORLD WID W, DOI DOI 10.1145/1526709.1526828
[5]  
[Anonymous], 2012, P 10 USENIX C OPERAT
[6]   Adaptive service composition in flexible processes [J].
Ardagna, Danilo ;
Pernici, Barbara .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2007, 33 (06) :369-384
[7]   An automated, cross-layer instrumentation framework for diagnosing performance problems in distributed applications [J].
Ates, Emre ;
Sturmann, Lily ;
Toslali, Mert ;
Krieger, Orran ;
Megginson, Richard ;
Coskun, Ayse K. ;
Sambasivan, Raja R. .
PROCEEDINGS OF THE 2019 TENTH ACM SYMPOSIUM ON CLOUD COMPUTING (SOCC '19), 2019, :165-170
[8]   Critical path analysis of TCP transactions [J].
Barford, P ;
Crovella, M .
ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2000, 30 (04) :127-138
[9]  
Barham P, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P259
[10]  
Bin Tariq M, 2008, ACM SIGCOMM COMP COM, V38, P99, DOI 10.1145/1402946.1402971