Baymax: QoS Awareness and Increased Utilization for Non-Preemptive Accelerators in Warehouse Scale Computers

被引：111

作者：

Chen, Quan ^{[1
,2
]}

Yang, Hailong ^{[1
,3
]}

Mars, Jason ^{[1
]}

Tang, Lingjia ^{[1
]}

机构：

[1] Univ Michigan, Clar Lab, Ann Arbor, MI 48109 USA

[2] Shanghai Jiao Tong Univ, Shanghai, Peoples R China

[3] Beihang Univ, Beijing, Peoples R China

来源：

ACM SIGPLAN NOTICES | 2016年 / 51卷 / 04期

基金：

美国国家科学基金会;

关键词：

scheduling; quality of service; warehouse scale computers; non-preemptive accelerators; FRAMEWORK;

D O I：

10.1145/2954679.2872368

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Modern warehouse-scale computers (WSCs) are being out-fitted with accelerators to provide the significant compute required by emerging intelligent personal assistant (IPA) workloads such as voice recognition, image classification, and natural language processing. It is well known that the diurnal user access pattern of user-facing services provides a strong incentive to co-locate applications for better accelerator utilization and efficiency, and prior work has focused on enabling co-location on multicore processors. However, interference when co-locating applications on non-preemptive accelerators is fundamentally different than contention on multi-core CPUs and introduces a new set of challenges to reduce QoS violation. To address this open problem, we first identify the underlying causes for QoS violation in accelerator-out-fitted servers. Our experiments show that queuing delay for the compute resources and PCI-e bandwidth contention for data transfer are the main two factors that contribute to the long tails of user-facing applications. We then present Baymax, a runtime system that orchestrates the execution of compute tasks from different applications and mitigates PCI-e bandwidth contention to deliver the required QoS for user-facing applications and increase the accelerator utilization. Using DjiNN, a deep neural network service, Sirius, an end-to-end IPA workload, and traditional applications on a Nvidia K40 GPU, our evaluation shows that Baymax improves the accelerator utilization by 91.3% while achieving the desired 99%-ile latency target for for user-facing applications. In fact, Baymax reduces the 99%-ile latency of user-facing applications by up to 195x over default execution.

引用

页码：681 / 696

页数：16

共 60 条

[1]

Agarwal N, 2015, INT S HIGH PERF COMP, P354, DOI 10.1109/HPCA.2015.7056046

[2]

Aguilera P, 2014, ASIA S PACIF DES AUT, P726, DOI 10.1109/ASPDAC.2014.6742976

[3]

[Anonymous], 47 ANN IEEE ACM INT

[4]

[Anonymous], 42 INT S COMP ARCH I

[5]

[Anonymous], 47 ANN IEEE ACM INT

[6]

[Anonymous], 2007, ISMM

[7]

[Anonymous], 2011, IEEE 2011 WORKSHOP A

[8]

[Anonymous], 41 INT S COMP ARCH I

[9]

[Anonymous], 41 INT S COMP ARCH I

[10]

[Anonymous], 2018, The Datacenter as a Computer: Designing Warehouse-Scale Machines, DOI DOI 10.2200/S00516ED2V01Y201306CAC024

← 1 2 3 4 5 6 →