Overload Control for Scaling WeChat Microservices

被引:70
作者
Zhou, Hao [1 ]
Chen, Ming [1 ]
Lin, Qian [2 ]
Wang, Yong [1 ]
She, Xiaobin [1 ]
Liu, Sifan [1 ]
Gu, Rui [3 ]
Ooi, Beng Chin [2 ]
Yang, Junfeng [3 ]
机构
[1] Tencent Inc, Shenzhen, Peoples R China
[2] Natl Univ Singapore, Singapore, Singapore
[3] Columbia Univ, New York, NY USA
来源
PROCEEDINGS OF THE 2018 ACM SYMPOSIUM ON CLOUD COMPUTING (SOCC '18) | 2018年
关键词
overload control; service admission control; microservice architecture; WeChat;
D O I
10.1145/3267809.3267823
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Effective overload control for large-scale online service system is crucial for protecting the system backend from overload. Conventionally, the design of overload control is ad-hoc for individual service. However, service-specific overload control could be detrimental to the overall system due to intricate service dependencies or flawed implementation of service. Service developers usually have difficulty to accurately estimate the dynamics of actual workload during the development of service. Therefore, it is essential to decouple the overload control from service logic. In this paper, we propose DAGOR, an overload control scheme designed for the account-oriented microservice architecture. DAGOR is service agnostic and system-centric. It manages overload at the microservice granule such that each microservice monitors its load status in real time and triggers load shedding in a collaborative manner among its relevant services when overload is detected. DAGOR has been used in the WeChat backend for five years. Experimental results show that DAGOR can benefit high success rate of service even when the system is experiencing overload, while ensuring fairness in the overload control.
引用
收藏
页码:149 / 161
页数:13
相关论文
共 33 条
[1]  
Almeida V. A. F., 2002, IT Professional, V4, P33, DOI 10.1109/MITP.2002.1046642
[2]  
[Anonymous], P USENIX ANN TECHN C
[3]  
Banga Gaurav, 1999, P USENIX S OP SYST D
[4]  
Ben Maurer, 2015, ACM QUEUE, V13
[5]  
Bestavros A., 1997, REAL TIME DATABASE S, P193
[6]  
Chandrasekaran Sirish, 2004, P INT C VERY LARGE D
[7]  
CHEN X, 2001, P INT C WORLD WID WE
[8]   Session-based admission control: A mechanism for peak load management of commercial web sites [J].
Cherkasova, L ;
Phaal, P .
IEEE TRANSACTIONS ON COMPUTERS, 2002, 51 (06) :669-685
[9]  
Chowdhury Mosharaf, 2014, P ACM SIGCOMM INT C
[10]  
DeCandia G., 2007, P ACM S OP SYST PRIN