DS-Net plus plus : Dynamic Weight Slicing for Efficient Inference in CNNs and Vision Transformers

被引：45

作者：

Li, Changlin ^{[1
]}

Wang, Guangrun ^{[2
]}

Wang, Bing ^{[3
]}

Liang, Xiaodan ^{[4
]}

Li, Zhihui ^{[5
]}

Chang, Xiaojun ^{[1
]}

机构：

[1] Univ Technol Sydney, Australian Artificial Intelligence Inst, Ultimo, NSW 2007, Australia

[2] Univ Oxford, Dept Engn Sci, Oxford OX1 2JD, England

[3] Alibaba Grp, Hangzhou 311121, Peoples R China

[4] Sun Yat Sen Univ, Guangzhou 510275, Guangdong, Peoples R China

[5] Qilu Univ Technol, Shandong Acad Sci, Shandong Artificial Intelligence, Jinan 250316, Shandong, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2023年 / 45卷 / 04期

基金：

国家重点研发计划; 中国国家自然科学基金; 澳大利亚研究理事会;

关键词：

Training; Logic gates; Routing; Transformers; Neural networks; Optimization; Computer architecture; Adaptive inference; dynamic networks; dynamic pruning; efficient inference; efficient transformer; vision transformer;

D O I：

10.1109/TPAMI.2022.3194044

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Dynamic networks have shown their promising capability in reducing theoretical computation complexity by adapting their architectures to the input during inference. However, their practical runtime usually lags behind the theoretical acceleration due to inefficient sparsity. In this paper, we explore a hardware-efficient dynamic inference regime, named dynamic weight slicing, that can generalized well on multiple dimensions in both CNNs and transformers (e.g. kernel size, embedding dimension, number of heads, etc.). Instead of adaptively selecting important weight elements in a sparse way, we pre-define dense weight slices with different importance level by nested residual learning. During inference, weights are progressively sliced beginning with the most important elements to less important ones to achieve different model capacity for inputs with diverse difficulty levels. Based on this conception, we present DS-CNN++ and DS-ViT++, by carefully designing the double headed dynamic gate and the overall network architecture. We further propose dynamic idle slicing to address the drastic reduction of embedding dimension in DS-ViT++. To ensure sub-network generality and routing fairness, we propose a disentangled two-stage optimization scheme. In Stage I, in-place bootstrapping (IB) and multi-view consistency (MvCo) are proposed to stablize and improve the training of DS-CNN++ and DS-ViT++ supernet, respectively. In Stage II, sandwich gate sparsification (SGS) is proposed to assist the gate training. Extensive experiments on 4 datasets and 3 different network architectures demonstrate our methods consistently outperform the state-of-the-art static and dynamic model compression methods by a large margin (up to 6.6%). Typically, we achieves 2-4x computation reduction and up to 61.5% real-world acceleration on MobileNet, ResNet-50 and Vision Transformer, with minimal accuracy drops on ImageNet. Code release: https://github.com/changlin31/DS-Net.

引用

页码：4430 / 4446

页数：17

共 123 条

[21] AutoAugment: Learning Augmentation Strategies from Data [J].

Cubuk, Ekin D. ;

Zoph, Barret ;

Mane, Dandelion ;

Vasudevan, Vijay ;

Le, Quoc V. .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :113-123

[22] ConViT: improving vision transformers with soft convolutional inductive biases [J].

d'Ascoli, Stephane ;

Touvron, Hugo ;

Leavitt, Matthew L. ;

Morcos, Ari S. ;

Biroli, Giulio ;

Sagun, Levent .

JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2022, 2022 (11)

[23] Deformable Convolutional Networks [J].

Dai, Jifeng ;

Qi, Haozhi ;

Xiong, Yuwen ;

Li, Yi ;

Zhang, Guodong ;

Hu, Han ;

Wei, Yichen .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :764-773

[24]

Dehghani Mostafa, 2019, PROC INT C LEARN REP

[25]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[26]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

[27] More is Less: A More Complicated Network with Less Inference Complexity [J].

Dong, Xuanyi ;

Huang, Junshi ;

Yang, Yi ;

Yan, Shuicheng .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1895-1903

[28]

Dosovitskiy A., 2020, PROC ICLR, P1

[29]

Elbayad M., 2020, IEEE GLOBE WORK

[30]

Fedus W, 2022, Arxiv, DOI arXiv:2101.03961

← 1 2 3 4 5 6 7 8 9 10 →