Transparent Request Tracing and Sampling Method for Java-based Microservice System

被引:0
作者
Huang Z.-C. [1 ]
Chen P.-F. [1 ]
Yu G.-B. [1 ]
Chen H.-Y. [1 ]
机构
[1] School of Computer Science and Engineering, SUN Yat-Sen University, Guangzhou
来源
Ruan Jian Xue Bao/Journal of Software | 2023年 / 34卷 / 07期
关键词
cloud computing; dynamic instrumentation; microservice; request tracing; sampling;
D O I
10.13328/j.cnki.jos.006523
中图分类号
学科分类号
摘要
Microservice is becoming the mainstream architecture of the cloud-based software systems because of its agile development and rapid deployment. However, the structure of a microservice system is complex, it often has hundred of service instances. Moreover, the call relationship between services is extremely complex. When an anomaly occurs in the microservice system, it is difficult to locate the root causes of the anomaly. The end-to-end request tracing method becomes the standard configuration of a microservice system to solve this problem. However, current methods of distributed request tracing are intrusive to applications and heavily rely on the developers’ expertise in request tracing. Besides, it is unable to start or stop the tracing functionality at runtime. These defects not only increase the burden of developers but also restrict the adoption of distributed request tracing technique in practice. This study designs and implements a transparent request tracing system named Trace++, which can generate tracing code automatically and inject the generated code into the running application by using dynamic code instrumentation technology. Trace++ is low intrusive to programs, transparent to developers, and can start or stop the tracing functionality flexibly. In addition, the adaptive sampling method of Trace++ effectively reduces the cost of request tracing. The results of the experiments conducted on TrainTicket, a microservice system, show that Trace++ can discover the dependencies between services accurately and its performance cost is close to the source code instrumentation method when it starts request tracing. When the request tracing functionality is stopped, Trace++ incurs no performance cost. Moreover, the adaptive sampling method can preserve the representative trace data while 89.4% of trace data are reduced. © 2023 Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:3167 / 3187
页数:20
相关论文
共 41 条
  • [1] Lin JJ, Chen PF, Zheng ZB., Microscope: Pinpoint performance issues with causal graphs in micro-service environments, Proc. of the 16th Int’l Conf. on Service-oriented Computing, pp. 3-20, (2018)
  • [2] Yu GB, Chen PF, Zheng ZB., Microscaler: Automatic scaling for microservices with an online learning approach, Proc. of the 2019 IEEE Int’l Conf. on Web Services, pp. 68-75, (2019)
  • [3] Yu GB, Chen PF, Chen HY, Guan ZJ, Huang ZC, Jing LX, Weng TJ, Sun XM, Li XY., MicroRank: End-to-end latency issue localization with extended spectrum analysis in microservice environments, Proc. of the 2021 Web Conf, pp. 3087-3098, (2021)
  • [4] Yang Y, Li Y, Wu ZH., Survey of state-of-the-art distributed tracing technology, Ruan Jian Xue Bao/Journal of Software, 31, 7, pp. 2019-2039, (2020)
  • [5] Chanda A, Cox AL, Zwaenepoel W., Whodunit: Transactional profiling for multi-tier applications, Proc. of the 2nd ACM SIGOPS/EuroSys European Conf. on Computer Systems, pp. 17-30, (2007)
  • [6] Sambasivan RR, Fonseca RLC, Shafer I, Ganger RG., So, you want to trace your distributed system? Key design insights from years of practical experience, (2014)
  • [7] He ZL, Chen PF, Li XY, Wang YF, Yu GB, Chen CL, Li XR, Zheng ZB., A spatiotemporal deep learning approach for unsupervised anomaly detection in cloud systems, IEEE Trans. on Neural Networks and Learning Systems, pp. 1-15, (2020)
  • [8] Barham P, Donnelly A, Isaacs R, Mortier R., Using magpie for request extraction and workload modelling, Proc. of the 6th Conf. on Symp. on Operating Systems Design and Implementation, (2004)
  • [9] Chen MY, Accardi A, Kiciman E, Lloyd J, Patterson D, Fox A, Brewer E., Path-based failure and evolution management, Proc. of the 1st Symp. on Networked Systems Design and Implementation, pp. 309-322, (2004)
  • [10] Fonseca R, Freedman MJ, Porter G., Experiences with tracing causality in networked services, Proc. of the 2010 Internet Network Management Conf. on Research on Enterprise Networking, (2010)