Low Latency Implementations of CNN for Resource-Constrained IoT Devices

被引:5
作者
Mujtaba, Ahmed [1 ]
Lee, Wai-Kong [2 ]
Hwang, Seong Oun [2 ]
机构
[1] Gachon Univ, Dept IT Convergence Engn, Seongnam 13120, South Korea
[2] Gachon Univ, Dept Comp Engn, Seongnam 13120, South Korea
基金
新加坡国家研究基金会;
关键词
Convolutional neural networks; Internet-of-Things; microcontrollers; tiny ML;
D O I
10.1109/TCSII.2022.3205029
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Convolutional Neural Network (CNN) inference on a resource-constrained Internet-of-Things (IoT) device (i.e., ARM Cortex-M microcontroller) requires careful optimization to reduce the timing overhead. We propose two novel techniques to improve the computational efficiency of CNNs by targeting low-cost microcontrollers. Our techniques utilize on-chip memory and minimize redundant operations, yielding low-latency inference results on complex quantized models such as MobileNetV1. On the ImageNet dataset for per-layer quantization, we reduce inference latency and Multiply-and-Accumulate (MAC) per cycle by 22.4% and 22.9%, respectively, compared to the state-of-theart mixed-precision CMix-NN library. On the CIFAR-10 dataset for per-channel quantization, we reduce inference latency and MAC per cycle by 31.7% and 31.3%, respectively. The achieved low-latency inference results can improve the user experience and save power budget in resource-constrained IoT devices.
引用
收藏
页码:5124 / 5128
页数:5
相关论文
共 12 条
[1]   High-Performance Low-Memory Lowering: GEMM-based Algorithms for DNN Convolution [J].
Anderson, Andrew ;
Vasudevan, Aravind ;
Keane, Cormac ;
Gregg, David .
2020 IEEE 32ND INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2020), 2020, :99-106
[2]  
[Anonymous], 2016, P 4 INT WORKSH OPENC, DOI DOI 10.1145/2909437.2909443
[3]   DORY: Automatic End-to-End Deployment of Real-World DNNs on Low-Cost IoT MCUs [J].
Burrello, Alessio ;
Garofalo, Angelo ;
Bruschi, Nazareno ;
Tagliavini, Giuseppe ;
Rossi, Davide ;
Conti, Francesco .
IEEE TRANSACTIONS ON COMPUTERS, 2021, 70 (08) :1253-1268
[4]   Olympus: Reaching Memory-Optimality on DNN Processors [J].
Cai, Xuyi ;
Wang, Ying ;
Tu, Kaijie ;
Gao, Chengsi ;
Zhang, Lei .
IEEE TRANSACTIONS ON COMPUTERS, 2022, 71 (08) :1939-1951
[5]   CMix-NN: Mixed Low-Precision CNN Library for Memory-Constrained Edge Devices [J].
Capotondi, Alessandro ;
Rusci, Manuele ;
Fariselli, Marco ;
Benini, Luca .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2020, 67 (05) :871-875
[6]  
Du W., GREEN ELECT
[7]  
Howard AG, 2017, Arxiv, DOI [arXiv:1704.04861, DOI 10.48550/ARXIV.1704.04861]
[8]   Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference [J].
Jacob, Benoit ;
Kligys, Skirmantas ;
Chen, Bo ;
Zhu, Menglong ;
Tang, Matthew ;
Howard, Andrew ;
Adam, Hartwig ;
Kalenichenko, Dmitry .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2704-2713
[9]  
Lai LZ, 2018, Arxiv, DOI [arXiv:1801.06601, DOI 10.48550/ARXIV.1801.06601]
[10]  
Rusci M, 2019, Arxiv, DOI arXiv:1905.13082