Inference serving with end-to-end latency SLOs over dynamic edge networks

被引:1
作者
Nigade, Vinod [1 ]
Bauszat, Pablo [1 ]
Bal, Henri [1 ]
Wang, Lin [1 ]
机构
[1] Vrije Univ Amsterdam, Amsterdam, Netherlands
基金
荷兰研究理事会;
关键词
Inference serving; DNN adaptation; Data adaptation; Dynamic edge networks; Dynamic DNNs; VIDEO;
D O I
10.1007/s11241-024-09418-4
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
While high accuracy is of paramount importance for deep learning (DL) inference, serving inference requests on time is equally critical but has not been carefully studied especially when the request has to be served over a dynamic wireless network at the edge. In this paper, we propose Jellyfish-a novel edge DL inference serving system that achieves soft guarantees for end-to-end inference latency service-level objectives (SLO). Jellyfish handles the network variability by utilizing both data and deep neural network (DNN) adaptation to conduct tradeoffs between accuracy and latency. Jellyfish features a new design that enables collective adaptation policies where the decisions for data and DNN adaptations are aligned and coordinated among multiple users with varying network conditions. We propose efficient algorithms to continuously map users and adapt DNNs at runtime, so that we fulfill latency SLOs while maximizing the overall inference accuracy. We further investigate dynamic DNNs, i.e., DNNs that encompass multiple architecture variants, and demonstrate their potential benefit through preliminary experiments. Our experiments based on a prototype implementation and real-world WiFi and LTE network traces show that Jellyfish can meet latency SLOs at around the 99th percentile while maintaining high accuracy.
引用
收藏
页码:239 / 290
页数:52
相关论文
共 77 条
  • [31] Krizhevsky A., 2009, LEARNING MULTIPLE LA, V1, P6
  • [32] Laskaridis S., 2020, HAPI HARDWARE AWARE, P1
  • [33] Laskaridis S., 2021, ADAPTIVE INFERENCE E, P1
  • [34] Laskaridis Stefanos, 2020, SPINN SYNERGISTIC PR
  • [35] SubFlow: A Dynamic Induced-Subgraph Strategy Toward Real-Time DNN Inference and Training
    Lee, Seulki
    Nirjon, Shahriar
    [J]. 2020 IEEE REAL-TIME AND EMBEDDED TECHNOLOGY AND APPLICATIONS SYMPOSIUM (RTAS 2020), 2020, : 15 - 29
  • [36] Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade
    Li, Xiaoxiao
    Liu, Ziwei
    Luo, Ping
    Loy, Chen Change
    Tang, Xiaoou
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6459 - 6468
  • [37] Microsoft COCO: Common Objects in Context
    Lin, Tsung-Yi
    Maire, Michael
    Belongie, Serge
    Hays, James
    Perona, Pietro
    Ramanan, Deva
    Dollar, Piotr
    Zitnick, C. Lawrence
    [J]. COMPUTER VISION - ECCV 2014, PT V, 2014, 8693 : 740 - 755
  • [38] Edge Assisted Real-time Object Detection for Mobile Augmented Reality
    Liu, Luyang
    Li, Hongyu
    Gruteser, Marco
    [J]. MOBICOM'19: PROCEEDINGS OF THE 25TH ANNUAL INTERNATIONAL CONFERENCE ON MOBILE COMPUTING AND NETWORKING, 2019,
  • [39] Mao Z. M., 2012, P ACM MOBISYS, P225, DOI DOI 10.1145/2307636.2307658
  • [40] Split Computing and Early Exiting for Deep Learning Applications: Survey and Research Challenges
    Matsubara, Yoshitomo
    Levorato, Marco
    Restuccia, Francesco
    [J]. ACM COMPUTING SURVEYS, 2023, 55 (05)