Convolution without multiplication: A general speed up strategy for CNNs

被引:8
作者
Cai GuoRong [1 ]
Yang ShengMing [1 ]
Du Jing [1 ]
Wang ZongYue [1 ]
Huang Bin [1 ]
Guan Yin [3 ,4 ]
Su SongJian [5 ]
Su JinHe [1 ]
Su SongZhi [2 ]
机构
[1] Jimei Univ, Coll Comp Engn, Xiamen 361021, Peoples R China
[2] Xiamen Univ, Dept Artificial Intelligence, Xiamen 361005, Peoples R China
[3] NetDragon Websoft Inc, Lab Big Data & Artificial Intelligence, Fuzhou 350001, Peoples R China
[4] Minjiang Univ, Coll Comp & Control Engn, Fuzhou 350108, Peoples R China
[5] Ropeok Inc, Res Inst 3, Xiamen 361008, Peoples R China
基金
中国国家自然科学基金;
关键词
deep learning; convolutional neural network; network quantization; network speed up;
D O I
10.1007/s11431-021-1936-2
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Convolutional Neural Networks (CNN) have achieved great success in many computer vision tasks. However, it is difficult to deploy CNN models on low-cost devices with limited power budgets, because most existing CNN models are computationally expensive. Therefore, CNN model compression and acceleration have become a hot research topic in the deep learning area. Typical schemes for speeding up the feed-forward process with a slight accuracy loss include parameter pruning and sharing, low-rank factorization, compact convolutional filters and knowledge distillation. In this study, we propose a general acceleration scheme that replaces the floating-point multiplication with integer addition. To this end, we propose a general accelerate scheme, where the floating point multiplication is replaced by integer addition. The motivation is based on the fact that every floating point can be replaced by the summation of an exponential series. Therefore, the multiplication between two floating points can be converted to the addition among exponentials. In the experiment section, we directly apply the proposed scheme to AlexNet, VGG, ResNet for image classification, and Faster-RCNN for object detection. The results acquired from ImageNet and PASCAL VOC show that the proposed quantized scheme has a promising performance, even with only one item of exponential. Moreover, we analyzed the eciency of our method on mainstream FPGAs. The experimental results show that the proposed quantized scheme can achieve acceleration on FPGA with a slight accuracy loss.
引用
收藏
页码:2627 / 2639
页数:13
相关论文
共 39 条
  • [1] [Anonymous], ARXIV160601981
  • [2] [Anonymous], ARXIV150602626
  • [3] AdderNet: Do We Really Need Multiplications in Deep Learning?
    Chen, Hanting
    Wang, Yunhe
    Xu, Chunjing
    Shi, Boxin
    Xu, Chao
    Tian, Qi
    Xu, Chang
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 1465 - 1474
  • [4] Chen WL, 2015, PR MACH LEARN RES, V37, P2285
  • [5] Cheng Y., ARXIV171009282
  • [6] Cohen TS, 2016, PR MACH LEARN RES, V48
  • [7] Courbariaux M., ARXIV151100363
  • [8] Denton E, 2014, ADV NEUR IN, V27
  • [9] Gong Y., ARXIV14126115
  • [10] Gupta S, 2015, PR MACH LEARN RES, V37, P1737