Cutting-Edge Inference: Dynamic DNN Model Partitioning and Resource Scaling for Mobile AI

被引：1

作者：

Lim, Jeong-A ^{[1
]}

Lee, Joohyun ^{[2
]}

Kwak, Jeongho ^{[3
]}

Kim, Yeongjin ^{[1
]}

机构：

[1] Inha Univ, Dept Elect Engn, Incheon 22212, South Korea

[2] Hanyang Univ, Dept Elect & Elect Engn, Ansan 15588, South Korea

[3] Daegu Gyeongbuk Inst Sci & Technol DGIST, Informat & Commun Engn, Daegu 42988, South Korea

来源：

IEEE TRANSACTIONS ON SERVICES COMPUTING | 2024年 / 17卷 / 06期

基金：

新加坡国家研究基金会;

关键词：

Mobile handsets; Computational modeling; Servers; Artificial intelligence; Quality of experience; Artificial neural networks; Accuracy; DNN model partitioning; deep learning; mobile edge computing; mobile vision application; quality of experience; ALLOCATION;

D O I：

10.1109/TSC.2024.3466848

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recently, applications using artificial intelligence (AI) technique in mobile devices such as augmented reality have been extensively pervasive. The hardware specifications of mobile devices, dynamic service demands, stochastic network states, and characteristics of DNN (Deep Neural Network) models affect the quality of experience (QoE) of such applications. In this paper, we propose CutEdge , that leverages a virtual queue-based Lyapunov optimization framework to jointly optimize DNN model partitioning between a mobile device and a mobile edge computing (MEC) server and processing/networking resources in a mobile device with respect to internal/external system dynamics. Specifically, CutEdge makes decisions of (i) the partition point of DNN model between the mobile device and MEC server, (ii) GPU clock frequency, and (iii) transmission rates in a mobile device, simultaneously. Then, we theoretically show the optimal trade-off curves among energy consumption, throughput, and end-to-end latency yielded by CutEdge where such QoE metrics have not been jointly addressed in the previous studies. Moreover, we show the impact of joint optimization of three control parameters on the performances via real trace-driven simulations. Finally, we show the superiority of CutEdge over the existing algorithms by experiment on top of implemented testbed using an embedded AI device and an MEC server.

引用

页码：3300 / 3316

页数：17

共 44 条

[1] An algorithmic framework for convex mixed integer nonlinear programs [J].

Bonami, Pierre ;

Biegler, Lorenz T. ;

Conna, Andrew R. ;

Cornuejols, Gerard ;

Grossmann, Ignacio E. ;

Laird, Carl D. ;

Lee, Jon ;

Lodi, Andrea ;

Margot, Francois ;

Sawaya, Nicolas ;

Wachter, Andreas .

DISCRETE OPTIMIZATION, 2008, 5 (02) :186-204

[2]

Cai H., 2020, P INT C LEARN REPR, P1

[3] VisionScaling: Dynamic Deep Learning Model and Resource Scaling in Mobile Vision Applications [J].

Choi, Pyeongjun ;

Ham, Dongho ;

Kim, Yeongjin ;

Kwak, Jeongho .

IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (09) :15523-15539

[4]

developer.nvidia, NVIDIA Jetson TX2 developer kit

[5] JointDNN: An Efficient Training and Inference Engine for Intelligent Mobile Cloud Computing Services [J].

Eshratifar, Amir Erfan ;

Abrishami, Mohammad Saeed ;

Pedram, Massoud .

IEEE TRANSACTIONS ON MOBILE COMPUTING, 2021, 20 (02) :565-576

[6] AutoML for Video Analytics with Edge Computing [J].

Galanopoulos, Apostolos ;

Ayala-Romero, Jose A. ;

Leith, Douglas J. ;

Iosifidis, George .

IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2021), 2021,

[7]

github, MobileNetV3-Large DNN model

[8]

github, YOLOv4-tiny DNN model

[9] Dynamic Interplay Between Service Caching and Code Offloading in Mobile-Edge-Cloud Networks [J].

Ham, Dongho ;

Kim, Yeongjin ;

Kwak, Jeongho .

2023 20TH ANNUAL IEEE INTERNATIONAL CONFERENCE ON SENSING, COMMUNICATION, AND NETWORKING, SECON, 2023,

[10] Edge-assisted Online On-device Object Detection for Real-time Video Analytics [J].

Hanyao, Mengxi ;

Jin, Yibo ;

Qian, Zhuzhong ;

Zhang, Sheng ;

Lu, Sanglu .

IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2021), 2021,

← 1 2 3 4 5 →