CoFB: latency-constrained co-scheduling of flows and batches for deep learning inference service on the CPU-GPU system

被引：1

作者：

Zhang, Qi ^{[1
]}

Liu, Yi ^{[1
]}

Liu, Tao ^{[2
]}

Qian, Depei ^{[1
]}

机构：

[1] Beihang Univ, Sch Comp Sci & Engn, 37 Xueyuan Rd, Beijing 100190, Peoples R China

[2] Shandong Prov Key Lab Comp Networks, 28666 Jingshi Dong Lu, Jinan 250103, Shandong, Peoples R China

来源：

JOURNAL OF SUPERCOMPUTING | 2023年 / 79卷 / 13期

关键词：

Deep learning; Inference; Quality of service; Tail latency; GPU;

D O I：

10.1007/s11227-023-05183-6

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recent years have witnessed significant achievements in deep learning (DL) technologies. In the meantime, an increasing number of online service operators take advantage of deep learning to provide intelligent and personalized services. Although significant efforts have been put into optimizing the inference efficiency, our investigation shows that for many DL models that process data-intensive requests, the network I/O subsystem also plays an essential role in determining responsiveness. Furthermore, under the latency constraint, uncontrolled network flow processing will impact request batching. Based on the above observation, this paper proposes CoFB, an inference service system that optimizes performance in a holistic way. CoFB improves the load imbalance in the network I/O subsystem with a lightweight flow scheduling scheme that collaborates the network interface card with a dispatcher thread. In addition, CoFB introduces a request reordering and batching policy and an interference-aware concurrent batch throttling strategy for enforcing inference concerning the deadline. We evaluate CoFB on four DL inference services and compare it to two state-of-the-art inference systems: NVIDIA Triton and DVABatch. Experimental results show that CoFB outperforms these two baselines by serving up to 2.69x and 1.96x higher load under preset tail latency objectives, respectively.

引用

页码：14172 / 14199

页数：28

共 47 条

[11] E2bird: Enhanced Elastic Batch for Improving Responsiveness and Throughput of Deep Learning Services
Cui, Weihao
Chen, Quan
Zhao, Han
Wei, Mengze
Tang, Xiaoxin
Guo, Minyi
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (06) : 1307 - 1321
[12] Ebird: Elastic Batch for Improving Responsiveness and Throughput of Deep Learning Services
Cui, Weihao
Wei, Mengze
Chen, Quan
Tang, Xiaoxin
Leng, Jingwen
Li, Li
Guo, Mingyi
[J]. 2019 IEEE 37TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2019), 2019, : 497 - 505
[13] A review on viscosity retention of PAM solution for polymer flooding technology
Du, Juan
Lv, Chunhong
Lan, Xitang
Song, Jifeng
Liu, Pingli
Chen, Xiang
Wang, Qiang
Liu, Jinming
Guo, Guixian
[J]. PETROLEUM SCIENCE AND TECHNOLOGY, 2024, 42 (03) : 372 - 405
[14] eaChristopherOlston KirilGorovoy, 2016, TENSORFLOW SERVING
[15] Eunyoung Jeong, 2014, Proceedings of NSDI '14: 11th USENIX Symposium on Networked Systems Design and Implementation. NSDI '14, P489
[16] Fried J, 2020, PROCEEDINGS OF THE 14TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDI '20), P281
[17] Low Latency RNN Inference with Cellular Batching
Gao, Pin
Yu, Lingfan
Wu, Yongwei
Li, Jinyang
[J]. EUROSYS '18: PROCEEDINGS OF THE THIRTEENTH EUROSYS CONFERENCE, 2018,
[18] Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration
Gong, Yifan
Yuan, Geng
Zhan, Zheng
Niu, Wei
Li, Zhengang
Zhao, Pu
Cai, Yuxuan
Liu, Sijia
Ren, Bin
Lin, Xue
Tang, Xulong
Wang, Yanzhi
[J]. ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2022, 27 (05)
[19] Gujarati A, 2020, PROCEEDINGS OF THE 14TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDI '20), P443
[20] DeepRecSys: A System for Optimizing End-To-End At-Scale Neural Recommendation Inference
Gupta, Udit
Hsia, Samuel
Saraph, Vikram
Wang, Xiaodong
Reagen, Brandon
Wei, Gu-Yeon
Lee, Hsien-Hsin S.
Brooks, David
Wu, Carole-Jean
[J]. 2020 ACM/IEEE 47TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2020), 2020, : 982 - 995

← 1 2 3 4 5 →