TBDB: Token Bucket-Based Dynamic Batching for Resource Scheduling Supporting Neural Network Inference in Intelligent Consumer Electronics

被引：21

作者：

Gao, Honghao ^{[1
,2
]}

Qiu, Binyang ^{[1
]}

Wang, Ye ^{[1
]}

Yu, Si ^{[1
]}

Xu, Yueshen ^{[3
]}

Wang, Xinheng ^{[4
]}

机构：

[1] Shanghai Univ, Sch Comp Engn & Sci, Shanghai 200444, Peoples R China

[2] Gachon Univ, Coll Future Ind, Seongnam 461701, Gyeonggi, South Korea

[3] Xidian Univ, Sch Comp Sci & Technol, Xian 710126, Peoples R China

[4] Xian Jiaotong Liverpool Univ, Sch Adv Technol, Suzhou 215123, Peoples R China

来源：

IEEE TRANSACTIONS ON CONSUMER ELECTRONICS | 2024年 / 70卷 / 01期

关键词：

Task analysis; Throughput; Heuristic algorithms; Consumer electronics; Computational modeling; Servers; Performance evaluation; inference task; dynamic batching; workload balance; token bucket; neural network;

D O I：

10.1109/TCE.2023.3339633

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Consumer electronics such as mobile phones, wearable devices, and vehicle electronics use many intelligent applications such as voice commands, machine translation, and face recognition. These applications require large inference workloads to perform intelligent tasks, which are often completed using deep neural network (DNN) models. Traditional approaches rely on pure cloud computing, with consumer devices collecting data and cloud computing platforms completing inference tasks. In real life, the workloads of these applications are not fixed and are likely to exhibit fluctuations or unexpected surges, increasing the workload of cloud computing platforms. Simply increasing server resources often leads to resource waste. Therefore, a dynamic resource scheduling method is needed. In this paper, a token bucket-based dynamic batching (TBDB) algorithm that maintains throughput while reducing latency and increasing device utilization, especially for large volumes of requests, is proposed. Our work includes the following achievements: 1) We employ the token bucket algorithm to determine the workload, considering the concurrency and frequency of the data. We dynamically vary the maximum batch size (MBS) that will trigger the inference process for the next batch. 2) A low-coupling mode architecture that can be embedded into various consumer electronics in a plug-and-play manner is designed. 3) The performance of the electronic devices and the maximum latency are studied to provide guidance for setting hyperparameters. Finally, we evaluate the effectiveness of our method in three consumer electronic scenarios and present a theoretical analysis for setting hyperparameters in different scenarios.

引用

页码：1134 / 1144

页数：11

共 41 条

[1] BATCH: Machine Learning Inference Serving on Serverless Platforms with Adaptive Batching
Ali, Ahsan
Pinciroli, Riccardo
Yan, Feng
Smirni, Evgenia
[J]. PROCEEDINGS OF SC20: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC20), 2020,
[2] [Anonymous], Triton inference serving
[3] [Anonymous], TensorRT
[4] Trading Private Range Counting over Big IoT Data
Cai, Zhipeng
He, Zaobo
[J]. 2019 39TH IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2019), 2019, : 144 - 153
[5] A Private and Efficient Mechanism for Data Uploading in Smart Cyber-Physical Systems
Cai, Zhipeng
Zheng, Xu
[J]. IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2020, 7 (02): : 766 - 775
[6] A distributed game theoretical approach for credibility-guaranteed multimedia data offloading in MEC
Chen, Ying
Zhao, Jie
Zhou, Xiaokang
Qi, Lianyong
Xu, Xiaolong
Huang, Jiwei
[J]. INFORMATION SCIENCES, 2023, 644
[7] QoS-Aware Computation Offloading in LEO Satellite Edge Computing for IoT: A Game-Theoretical Approach
Chen, Ying
Hu, Jintao
Zhao, Jie
Min, Geyong
[J]. CHINESE JOURNAL OF ELECTRONICS, 2024, 33 (04) : 875 - 885
[8] LazyBatching: An SLA-aware Batching System for Cloud Machine Learning Inference
Choi, Yujeong
Kim, Yunseong
Rhu, Minsoo
[J]. 2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021), 2021, : 493 - 506
[9] Crankshaw Daniel, 2020, SoCC '20: Proceedings of the 11th ACM Symposium on Cloud Computing, P477, DOI 10.1145/3419111.3421285
[10] Crankshaw D., 2018, Queue, V16, P83

← 1 2 3 4 5 →