Improving Lightweight AdderNet via Distillation From ℓ2 to ℓ1-norm

被引：2

作者：

Dong, Minjing ^{[1
]}

Chen, Xinghao ^{[2
]}

Wang, Yunhe ^{[2
]}

Xu, Chang ^{[1
]}

机构：

[1] Univ Sydney, Fac Engn, Sch Comp Sci, Darlington, NSW 2008, Australia

[2] Huawei Noahs Ark Lab, Beijing 100085, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2023年 / 32卷

基金：

澳大利亚研究理事会;

关键词：

Correlation; Quantization (signal); Optimization; Knowledge engineering; Energy consumption; Convolutional neural networks; Convolution; Adder neural network; knowledge distillation; lightweight network;

D O I：

10.1109/TIP.2023.3318940

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

To achieve efficient inference with a hardware- friendly design, Adder Neural Networks (ANNs) are proposed to replace expensive multiplication operations in Convolutional Neural Networks (CNNs) with cheap additions through utilizing l(1)-norm for similarity measurement instead of cosine distance. However, we observe that there exists an increasing gap between CNNs and ANNs with reducing parameters, which cannot be eliminated by existing algorithms. In this paper, we present a simple yet effective Norm-Guided Distillation (NGD) method for l(1)-norm ANNs to learn superior performance from l(2)-norm ANNs. Although CNNs achieve similar accuracy with l(2)-norm ANNs, the clustering performance based on l(2)-distance can be easily learned by l(1)-norm ANNs compared with cross correlation in CNNs. The features in l(2)-norm ANNs are encouraged to achieve intra-class centralization and inter-class decentralization to amplify this advantage. Furthermore, the roughly estimated gradients in vanilla ANNs are modified to a progressive approx- imation from l(2)-norm to l(1)-norm so that a more accurate optimization can be achieved. Extensive evaluations on several benchmarks demonstrate the effectiveness of NGD on lightweight networks. For example, our method improves ANN by 10.43% with 0.25x GhostNet on CIFAR-100 and 3.1% with 1.0x GhostNet on ImageNet.

引用

页码：5524 / 5536

页数：13

共 68 条

[1] Variational Information Distillation for Knowledge Transfer [J].

Ahn, Sungsoo ;

Hu, Shell Xu ;

Damianou, Andreas ;

Lawrence, Neil D. ;

Dai, Zhenwen .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :9155-9163

[2]

Bernstein J, 2018, Arxiv, DOI arXiv:1802.04434

[3] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[4] AdderNet: Do We Really Need Multiplications in Deep Learning? [J].

Chen, Hanting ;

Wang, Yunhe ;

Xu, Chunjing ;

Shi, Boxin ;

Xu, Chao ;

Tian, Qi ;

Xu, Chang .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :1465-1474

[5] Data-Free Learning of Student Networks [J].

Chen, Hanting ;

Wang, Yunhe ;

Xu, Chang ;

Yang, Zhaohui ;

Liu, Chuanjian ;

Shi, Boxin ;

Xu, Chunjing ;

Xu, Chao ;

Tian, Qi .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3513-3521

[6] Distilling Knowledge via Knowledge Review [J].

Chen, Pengguang ;

Liu, Shu ;

Zhao, Hengshuang ;

Jia, Jiaya .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :5006-5015

[7]

Chen X., 2021, Advances in Neural Information Processing Systems, V34

[8]

Courbariaux M, 2016, Arxiv, DOI arXiv:1602.02830

[9]

Dally William, 2015, Nips Tutorial, V2, P3

[10] Where to Prune: Using LSTM to Guide Data-Dependent Soft Pruning [J].

Ding, Guiguang ;

Zhang, Shuo ;

Jia, Zizhou ;

Zhong, Jing ;

Han, Jungong .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 :293-304

← 1 2 3 4 5 6 7 →