Fully Dynamic Inference With Deep Neural Networks

被引:25
作者
Xia, Wenhan [1 ]
Yin, Hongxu [2 ]
Dai, Xiaoliang [3 ]
Jha, Niraj K. [1 ]
机构
[1] Princeton Univ, Dept Elect Engn, Princeton, NJ 08540 USA
[2] NVIDIA, Santa Clara, CA 95050 USA
[3] Facebook, Mobile Comp Vis Team, Menlo Pk, CA 94025 USA
基金
美国国家科学基金会;
关键词
Conditional computation; deep learning; dynamic execution; dynamic inference; model compression;
D O I
10.1109/TETC.2021.3056031
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Modern deep neural networks are powerful and widely applicable models that extract task-relevant information through multi-level abstraction. Their cross-domain success, however, is often achieved at the expense of computational cost, high memory bandwidth, and long inference latency, which prevents their deployment in resource-constrained and time-sensitive scenarios, such as edge-side inference and self-driving cars. While recently developed methods for creating efficient deep neural networks are making their real-world deployment more feasible by reducing model size, they do not fully exploit input properties on a per-instance basis to maximize computational efficiency and task accuracy. In particular, most existing methods typically use a one-size-fits-all approach that identically processes all inputs. Motivated by the fact that different images require different feature embeddings to be accurately classified, we propose a fully dynamic paradigm that imparts deep convolutional neural networks with hierarchical inference dynamics at the level of layers and individual convolutional filters/channels. Two compact networks, called Layer-Net (L-Net) and Channel-Net (C-Net), predict on a per-instance basis which layers or filters/channels are redundant and therefore should be skipped. L-Net and C-Net also learn how to scale retained computation outputs to maximize task accuracy. By integrating L-Net and C-Net into a joint design framework, called LC-Net, we consistently outperform state-of-the-art dynamic frameworks with respect to both efficiency and classification accuracy. On the CIFAR-10 dataset, LC-Net results in up to 11.9x fewer floating-point operations (FLOPs) and up to 3.3 percent higher accuracy compared to other dynamic inference methods. On the ImageNet dataset, LC-Net achieves up to 1.4x fewer FLOPs and up to 4.6 percent higher Top-1 accuracy than the other methods.
引用
收藏
页码:962 / 972
页数:11
相关论文
共 45 条
[1]  
Amodei D, 2016, PR MACH LEARN RES, V48
[2]  
[Anonymous], 2018, ARXIV180410123
[3]  
[Anonymous], 2009, Rep. TR-2009
[4]  
Baker B., 2016, P 5 INT C LEARNING R
[5]  
Bolukbasi T., 2017, PR MACH LEARN RES, P527
[6]   ZeroQ: A Novel Zero Shot Quantization Framework [J].
Cai, Yaohui ;
Yao, Zhewei ;
Dong, Zhen ;
Gholami, Amir ;
Mahoney, Michael W. ;
Keutzer, Kurt .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :13166-13175
[7]   NeST: A Neural Network Synthesis Tool Based on a Grow-and-Prune Paradigm [J].
Dai, Xiaoliang ;
Yin, Hongxu ;
Jha, Niraj K. .
IEEE TRANSACTIONS ON COMPUTERS, 2019, 68 (10) :1487-1497
[8]   ChamNet: Towards Efficient Network Design through Platform-Aware Model Adaptation [J].
Dai, Xiaoliang ;
Zhang, Peizhao ;
Wu, Bichen ;
Yin, Hongxu ;
Sun, Fei ;
Wang, Yanghan ;
Dukhan, Marat ;
Hu, Yunqing ;
Wu, Yiming ;
Jia, Yangqing ;
Vajda, Peter ;
Uyttendaele, Matt ;
Jha, Niraj K. .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :11390-11399
[9]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[10]   Efficient Network Construction Through Structural Plasticity [J].
Du, Xiaocong ;
Li, Zheng ;
Ma, Yufei ;
Cao, Yu .
IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2019, 9 (03) :453-464