nnPerf: Demystifying DNN Runtime Inference Latency on Mobile Platforms

被引：1

作者：

Chu, Haolin ^{[1
]}

Zheng, Xiaolong ^{[1
]}

Liu, Liang ^{[1
]}

Ma, Huadong ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 21ST ACM CONFERENCE ON EMBEDDED NETWORKED SENSOR SYSTEMS, SENSYS 2023 | 2023年

基金：

中国国家自然科学基金;

关键词：

Mobile GPU; Deep Neural Network; Inference latency; Profiling;

D O I：

10.1145/3625687.3625797

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

We present nnPerf, a real-time on-device profiler designed to collect and analyze the DNN model run-time inference latency on mobile platforms. nnPerf demystifies the hidden layers and metrics used for pursuing I)NN optimizations and adaptations al the granularity of operators and kernels, ensuring every facet contributing to a DNN model's run-time efficiency is easily accessible to mobile developers via well-defined APIs. With nnPerf, the mobile developers can easily identify the bottleneck in model run-time efficiency and optimize the model architecture to meet system -level objectives (SIX)). We implement nnPerf on TFLite framework and evaluate its e2e-, operator-, and kernel -latency profiling accuracy across four mobile platforms. The results show that nnPerf achieves consistently high latency profiling accuracy on both CPU (98.12%) and CPU (99.87%). Our benchmark studies demonstrate that running nnPerf on mobile devices introduces the minlintun overhead to model inference, with 0.231% and 0.605% extra inference latency and power consumption. We further run a case study to show how we leverage nnPerf to migrate OFA, a SOTA NAS system, to kernel oriented model optimization on GPUs.

引用

页码：125 / 137

页数：13

共 89 条

[1] Adreno GPU Profiler, ADRENO GPU PROFILER
[2] Smart at what cost? Characterising Mobile Deep Neural Networks in the wild
Almeida, Mario
Laskaridis, Stefanos
Mehrotra, Abhinav
Dudziak, Lukasz
Leontiadis, Ilias
Lane, Nicholas D.
[J]. PROCEEDINGS OF THE 2021 ACM INTERNET MEASUREMENT CONFERENCE, IMC 2021, 2021, : 658 - 672
[3] android, ANDROID STUDIO PROFI
[4] [Anonymous], TENSORFLOW BENCHMARK
[5] [Anonymous], 2021, NVIDIA NSIGHT COMPUT
[6] [Anonymous], ANDROID GPU INSPECTO
[7] [Anonymous], SSD MOBILEV2 TFLITE
[8] apple, XCODE INSTRUMENT
[9] bbc, CREATES NEW LEVELS D
[10] Bhaskaracharya S. C., 2020, ARXIV

← 1 2 3 4 5 6 7 8 9 →