Fully Dynamic Inference With Deep Neural Networks

被引：25

作者：

Xia, Wenhan ^{[1
]}

Yin, Hongxu ^{[2
]}

Dai, Xiaoliang ^{[3
]}

Jha, Niraj K. ^{[1
]}

机构：

[1] Princeton Univ, Dept Elect Engn, Princeton, NJ 08540 USA

[2] NVIDIA, Santa Clara, CA 95050 USA

[3] Facebook, Mobile Comp Vis Team, Menlo Pk, CA 94025 USA

来源：

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING | 2022年 / 10卷 / 02期

基金：

美国国家科学基金会;

关键词：

Conditional computation; deep learning; dynamic execution; dynamic inference; model compression;

D O I：

10.1109/TETC.2021.3056031

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Modern deep neural networks are powerful and widely applicable models that extract task-relevant information through multi-level abstraction. Their cross-domain success, however, is often achieved at the expense of computational cost, high memory bandwidth, and long inference latency, which prevents their deployment in resource-constrained and time-sensitive scenarios, such as edge-side inference and self-driving cars. While recently developed methods for creating efficient deep neural networks are making their real-world deployment more feasible by reducing model size, they do not fully exploit input properties on a per-instance basis to maximize computational efficiency and task accuracy. In particular, most existing methods typically use a one-size-fits-all approach that identically processes all inputs. Motivated by the fact that different images require different feature embeddings to be accurately classified, we propose a fully dynamic paradigm that imparts deep convolutional neural networks with hierarchical inference dynamics at the level of layers and individual convolutional filters/channels. Two compact networks, called Layer-Net (L-Net) and Channel-Net (C-Net), predict on a per-instance basis which layers or filters/channels are redundant and therefore should be skipped. L-Net and C-Net also learn how to scale retained computation outputs to maximize task accuracy. By integrating L-Net and C-Net into a joint design framework, called LC-Net, we consistently outperform state-of-the-art dynamic frameworks with respect to both efficiency and classification accuracy. On the CIFAR-10 dataset, LC-Net results in up to 11.9x fewer floating-point operations (FLOPs) and up to 3.3 percent higher accuracy compared to other dynamic inference methods. On the ImageNet dataset, LC-Net achieves up to 1.4x fewer FLOPs and up to 4.6 percent higher Top-1 accuracy than the other methods.

引用

页码：962 / 972

页数：11

共 45 条

[1]

Amodei D, 2016, PR MACH LEARN RES, V48

[2]

[Anonymous], 2018, ARXIV180410123

[3]

[Anonymous], 2009, Rep. TR-2009

[4]

Baker B., 2016, P 5 INT C LEARNING R

[5]

Bolukbasi T., 2017, PR MACH LEARN RES, P527

[6] ZeroQ: A Novel Zero Shot Quantization Framework [J].

Cai, Yaohui ;

Yao, Zhewei ;

Dong, Zhen ;

Gholami, Amir ;

Mahoney, Michael W. ;

Keutzer, Kurt .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :13166-13175

[7] NeST: A Neural Network Synthesis Tool Based on a Grow-and-Prune Paradigm [J].

Dai, Xiaoliang ;

Yin, Hongxu ;

Jha, Niraj K. .

IEEE TRANSACTIONS ON COMPUTERS, 2019, 68 (10) :1487-1497

[8] ChamNet: Towards Efficient Network Design through Platform-Aware Model Adaptation [J].

Dai, Xiaoliang ;

Zhang, Peizhao ;

Wu, Bichen ;

Yin, Hongxu ;

Sun, Fei ;

Wang, Yanghan ;

Dukhan, Marat ;

Hu, Yunqing ;

Wu, Yiming ;

Jia, Yangqing ;

Vajda, Peter ;

Uyttendaele, Matt ;

Jha, Niraj K. .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :11390-11399

[9]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[10] Efficient Network Construction Through Structural Plasticity [J].

Du, Xiaocong ;

Li, Zheng ;

Ma, Yufei ;

Cao, Yu .

IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2019, 9 (03) :453-464

← 1 2 3 4 5 →