NeuroBalancer: Balancing System Frequencies With Punctual Laziness for Timely and Energy-Efficient DNN Inferences

被引：0

作者：

Bin, Kyungmin ^{[1
]}

Kim, Seyeon ^{[2
]}

Ha, Sangtae ^{[2
]}

Chong, Song ^{[3
]}

Lee, Kyunghan ^{[1
]}

机构：

[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul 08826, South Korea

[2] Univ Colorado, Comp Sci Dept, Boulder, CO 80309 USA

[3] Korea Adv Inst Sci & Technol KAIST, Grad Sch AI, Daejeon 34141, South Korea

来源：

IEEE TRANSACTIONS ON MOBILE COMPUTING | 2025年 / 24卷 / 05期

基金：

新加坡国家研究基金会;

关键词：

Mobile computing; DVFS; deep learning; energy efficiency; on-device inference; MOBILE;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

On-device deep neural network (DNN) inference is often desirable for user experience and privacy. Existing solutions have fully utilized resources to minimize inference latency. However, they result in severe energy inefficiency by completing DNN inference much earlier than the required service interval. It poses a new challenge of how to make DNN inferences in a punctual and energy-efficient manner. To tackle this challenge, we propose a new resource allocation strategy for DNN processing, namely punctual laziness that disperses its workload as efficiently as possible over time within its strict delay constraint. This strategy is particularly beneficial for neural workloads since a DNN comprises a set of popular operators whose latency and energy consumption are predictable. Through this understanding, we propose NeuroBalancer, an operator-aware core and memory frequency scaling framework that balances those frequencies as efficiently as possible while making timely inferences. We implement and evaluate NeuroBalancer on off-the-shelf Android devices with various state-of-the-art DNN models. Our results show that NeuroBalancer successfully meets a given inference latency requirements while saving energy consumption up to 43.9% and 21.1% compared to the Android's default governor and up to 42.1% and 18.6% compared to SysScale, the state-of-the-art mobile governor on CPU and GPU, respectively.

引用

页码：4339 / 4354

页数：16

共 51 条

[1]

Almeida M., 2019, P 3 INT WORKSH DEEP, P1

[2] Smart at what cost? Characterising Mobile Deep Neural Networks in the wild [J].

Almeida, Mario ;

Laskaridis, Stefanos ;

Mehrotra, Abhinav ;

Dudziak, Lukasz ;

Leontiadis, Ilias ;

Lane, Nicholas D. .

PROCEEDINGS OF THE 2021 ACM INTERNET MEASUREMENT CONFERENCE, IMC 2021, 2021, :658-672

[3]

[Anonymous], [158] TensorFlow. Accessed on: Apr. 6, 2020. [Online]. Available: https://www.tensorflow.org/

[4]

[Anonymous], 2015, Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding

[5]

[Anonymous], 140. play.google.com/stor. [Online] https://play.google.com/store/apps/details?id=novabase.mobie.activitiesfeature=search_result.

[6]

[Anonymous], 2019, Pytorch mobile

[7]

[Anonymous], 2006, Linux Symposium '06

[8]

[Anonymous], 2017, CPU performance scaling

[9]

Bateni S, 2020, PROCEEDINGS OF THE 2020 USENIX ANNUAL TECHNICAL CONFERENCE, P371

[10] PredJoule: A Timing-Predictable Energy Optimization Framework for Deep Neural Networks [J].

Bateni, Soroush ;

Zhou, Husheng ;

Zhu, Yuankun ;

Liu, Cong .

2018 39TH IEEE REAL-TIME SYSTEMS SYMPOSIUM (RTSS 2018), 2018, :107-118

← 1 2 3 4 5 6 →