nnPerf: Demystifying DNN Runtime Inference Latency on Mobile Platforms

被引:3
作者
Chu, Haolin [1 ]
Zheng, Xiaolong [1 ]
Liu, Liang [1 ]
Ma, Huadong [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing, Peoples R China
来源
PROCEEDINGS OF THE 21ST ACM CONFERENCE ON EMBEDDED NETWORKED SENSOR SYSTEMS, SENSYS 2023 | 2023年
基金
中国国家自然科学基金;
关键词
Mobile GPU; Deep Neural Network; Inference latency; Profiling;
D O I
10.1145/3625687.3625797
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We present nnPerf, a real-time on-device profiler designed to collect and analyze the DNN model run-time inference latency on mobile platforms. nnPerf demystifies the hidden layers and metrics used for pursuing I)NN optimizations and adaptations al the granularity of operators and kernels, ensuring every facet contributing to a DNN model's run-time efficiency is easily accessible to mobile developers via well-defined APIs. With nnPerf, the mobile developers can easily identify the bottleneck in model run-time efficiency and optimize the model architecture to meet system -level objectives (SIX)). We implement nnPerf on TFLite framework and evaluate its e2e-, operator-, and kernel -latency profiling accuracy across four mobile platforms. The results show that nnPerf achieves consistently high latency profiling accuracy on both CPU (98.12%) and CPU (99.87%). Our benchmark studies demonstrate that running nnPerf on mobile devices introduces the minlintun overhead to model inference, with 0.231% and 0.605% extra inference latency and power consumption. We further run a case study to show how we leverage nnPerf to migrate OFA, a SOTA NAS system, to kernel oriented model optimization on GPUs.
引用
收藏
页码:125 / 137
页数:13
相关论文
共 90 条
[41]  
Lau H., 2018, ARXIV
[42]  
Lee J., 2021, P 5 INT WORKSH EMB M, P25
[43]  
Li Zhang Lyna, 2021, MobiSys '21: Proceedings of the 19th Annual International Conference on Mobile Systems, Applications, and Services, P81, DOI 10.1145/3458864.3467882
[44]   Romou: Rapidly Generate High-Performance Tensor Kernels for Mobile GPUs [J].
Liang, Rendong ;
Cao, Ting ;
Wen, Jicheng ;
Wang, Manni ;
Wang, Yang ;
Zou, Jianhua ;
Liu, Yunxin .
PROCEEDINGS OF THE 2022 THE 28TH ANNUAL INTERNATIONAL CONFERENCE ON MOBILE COMPUTING AND NETWORKING, ACM MOBICOM 2022, 2022, :487-500
[45]  
Lin J., 2022, ARXIV
[46]  
Lin Yujun, 2020, Advances in Neural Information Processing Systems, V33
[47]  
Liu ZC, 2019, IEEE I CONF COMP VIS, P3295, DOI [10.1109/ICCV.2019.00339, 10.1109/ICCV.2019.00339D\]
[48]  
Ma Z, 14 USENIX S OP SYST
[49]  
MACE, US
[50]  
MNN, About us