Inference serving with end-to-end latency SLOs over dynamic edge networks

被引：1

作者：

Nigade, Vinod ^{[1
]}

Bauszat, Pablo ^{[1
]}

Bal, Henri ^{[1
]}

Wang, Lin ^{[1
]}

机构：

[1] Vrije Univ Amsterdam, Amsterdam, Netherlands

来源：

REAL-TIME SYSTEMS | 2024年 / 60卷 / 02期

基金：

荷兰研究理事会;

关键词：

Inference serving; DNN adaptation; Data adaptation; Dynamic edge networks; Dynamic DNNs; VIDEO;

D O I：

10.1007/s11241-024-09418-4

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

While high accuracy is of paramount importance for deep learning (DL) inference, serving inference requests on time is equally critical but has not been carefully studied especially when the request has to be served over a dynamic wireless network at the edge. In this paper, we propose Jellyfish-a novel edge DL inference serving system that achieves soft guarantees for end-to-end inference latency service-level objectives (SLO). Jellyfish handles the network variability by utilizing both data and deep neural network (DNN) adaptation to conduct tradeoffs between accuracy and latency. Jellyfish features a new design that enables collective adaptation policies where the decisions for data and DNN adaptations are aligned and coordinated among multiple users with varying network conditions. We propose efficient algorithms to continuously map users and adapt DNNs at runtime, so that we fulfill latency SLOs while maximizing the overall inference accuracy. We further investigate dynamic DNNs, i.e., DNNs that encompass multiple architecture variants, and demonstrate their potential benefit through preliminary experiments. Our experiments based on a prototype implementation and real-world WiFi and LTE network traces show that Jellyfish can meet latency SLOs at around the 99th percentile while maintaining high accuracy.

引用

页码：239 / 290

页数：52

共 77 条

[41] Nigade V., 2021, BETTER NEVER LATE TI, P426
[42] Jellyfish: Timely Inference Serving for Dynamic Edge Networks
Nigade, Vinod
Bauszat, Pablo
Bal, Henri
Wang, Lin
[J]. 2022 IEEE 43RD REAL-TIME SYSTEMS SYMPOSIUM (RTSS 2022), 2022, : 277 - 290
[43] NVIDIA, NVIDIA Triton Inference Server Documentation
[44] Olston Christopher, 2017, arXiv
[45] PyTorch, 2022, Reproducibility
[46] Pytorch, 2021, TORCHSERVE
[47] Qu Zhe, 2022, ARXIV
[48] Ran XK, 2018, IEEE INFOCOM SER, P1421, DOI 10.1109/INFOCOM.2018.8485905
[49] FA2: Fast, Accurate Autoscaling for Serving Deep Learning Inference with SLA Guarantees
Razavi, Kamran
Luthra, Manisha
Koldehofe, Boris
Muhlhauser, Max
Wang, Lin
[J]. 2022 IEEE 28TH REAL-TIME AND EMBEDDED TECHNOLOGY AND APPLICATIONS SYMPOSIUM (RTAS), 2022, : 146 - 159
[50] Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Ren, Shaoqing
He, Kaiming
Girshick, Ross
Sun, Jian
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (06) : 1137 - 1149

← 1 2 3 4 5 6 7 8 →