CODEC: Cost-Effective Duration Prediction System for Deadline Scheduling in the Cloud

被引:1
作者
Li, Haozhe [1 ,2 ]
Ma, Minghua [2 ]
Liu, Yudong [2 ]
Qin, Si [2 ]
Qiao, Bo [2 ]
Yao, Randolph [3 ]
Chaturvedi, Harshwardhan [3 ]
Tri Tran [3 ]
Chintalapati, Murali [3 ]
Rajmohan, Saravan [3 ]
Lin, Qingwei [2 ]
Zhang, Dongmei [2 ]
机构
[1] Peking Univ, Beijing, Peoples R China
[2] Microsoft, Beijing, Peoples R China
[3] Microsoft, Redmond, WA USA
来源
2023 IEEE 34TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING, ISSRE | 2023年
关键词
Cloud Systems; Deadline Scheduling;
D O I
10.1109/ISSRE59848.2023.00069
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Modern cloud platforms allow customers to flexibly allocate or release computing resources. One crucial scenario is how to drive existing VMs to a specific state by a given deadline in a reliable and cost-effective manner. These state transition requests could involve starting or reallocating VMs. Performing millions of these requests per day can be challenging because the throughput of cloud platforms is not consistently deterministic. To meet customer service level agreements, cloud providers often trade-off cost for reliability and use oversized estimates to ensure that they initiate state transitions against VMs well ahead of time. In this paper, we propose a COst-effective Duration prEdiCtion system (CODEC) that solves the deadline scheduling problem for cloud providers by building intelligent automation to discover and execute optimal strategies when performing concurrent requests. In the CODEC, the core is to categorize durations of requests into buckets in real-time and the buffer time of each bucket is then predicted by extreme value theory. Extensive experiments show the proposed approach can guarantee the success rate of requests, while significantly saving cost with 38.71% and 86.97% compared to the baseline approach and the static buffer time, respectively. The results obtained show that CODEC can effectively and efficiently solve the deadline scheduling problem, and lead a worldwide cloud provider to deploy its prototype.
引用
收藏
页码:298 / 308
页数:11
相关论文
empty
未找到相关数据