QD-Compressor: a Quantization-based Delta Compression Framework for Deep Neural Networks

被引:5
作者
Zhang, Shuyu [1 ]
Wu, Donglei [1 ]
Jin, Haoyu [1 ]
Zou, Xiangyu [1 ]
Xia, Wen [1 ,2 ,3 ]
Huang, Xiaojia [1 ]
机构
[1] Harbin Inst Technol, Shenzhen, Peoples R China
[2] Peng Cheng Lab, Cyberspace Secur Res Ctr, Shenzhen, Peoples R China
[3] Chinese Acad Sci, State Key Lab Comp Architecture, Inst Comp Technol, Beijing, Peoples R China
来源
2021 IEEE 39TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2021) | 2021年
基金
中国国家自然科学基金;
关键词
Neural networks; Delta compression; Quantization;
D O I
10.1109/ICCD53106.2021.00088
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Deep neural networks (DNNs) have achieved remarkable success in many fields. Large-scale DNNs also bring storage challenges when storing snapshots for preventing clusters' frequent failures, and bring massive internet traffic when dispatching or updating DNNs for resource-constrained devices (e.g., IoT devices, mobile phones). Several approaches are aiming to compress DNNs. The Recent work, Delta-DNN, notices high similarity existed in DNNs and thus calculates differences between them for improving the compression ratio. However, we observe that Delta-DNN, applying traditional global lossy quantization technique in calculating differences of two neighboring versions of the DNNs, can not fully exploit the data similarity between them for delta compression. This is because the parameters' value ranges (and also the delta data in Delta-DNN) are varying among layers in DNNs, which inspires us to propose a local-sensitive quantization scheme: the quantizers are adaptive to parameters' local value ranges in layers. Moreover, instead of quantizing differences of DNNs in Delta-DNN, our approach quantizes DNNs before calculating differences to make the differences more compressible. Besides, we also propose an error feedback mechanism to reduce DNNs' accuracy loss caused by the lossy quantization. Therefore, we design a novel quantization-based delta compressor called QD-Compressor, which calculates the lossy differences between epochs of DNNs for saving storage cost of backing up DNNs' snapshots and internet traffic of dispatching DNNs for resource-constrained devices. Experiments on several popular DNNs and datasets show that QD-Compressor obtains a compression ratio of 2.4x similar to 31.5x higher than the state-of-the-art approaches while well maintaining the model's test accuracy.
引用
收藏
页码:542 / 550
页数:9
相关论文
共 30 条
  • [1] [Anonymous], 2015, P 3 INT C LEARN REPR
  • [2] [Anonymous], 2015, Distilling the knowledge in a neural network
  • [3] [Anonymous], 2019, P ICLR
  • [4] [Anonymous], 2020, P ICPP
  • [5] The Tail at Scale
    Dean, Jeffrey
    Barroso, Luiz Andre
    [J]. COMMUNICATIONS OF THE ACM, 2013, 56 (02) : 74 - 80
  • [6] Lessons Learned From the Analysis of System Failures at Petascale: The Case of Blue Waters
    Di Martino, Catello
    Kalbarczyk, Zbigniew
    Iyer, Ravishankar K.
    Baccanico, Fabio
    Fullop, Joseph
    Kramer, William
    [J]. 2014 44TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN), 2014, : 610 - 621
  • [7] Rich feature hierarchies for accurate object detection and semantic segmentation
    Girshick, Ross
    Donahue, Jeff
    Darrell, Trevor
    Malik, Jitendra
    [J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 580 - 587
  • [8] Guo YW, 2016, ADV NEUR IN, V29
  • [9] Failures in Large Scale Systems: Long-term Measurement, Analysis, and Implications
    Gupta, Saurabh
    Patel, Tirthak
    Engelmann, Christian
    Tiwari, Devesh
    [J]. SC'17: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2017,
  • [10] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778