Latent Weight Quantization for Integerized Training of Deep Neural Networks

被引:0
|
作者
Fei, Wen [1 ]
Dai, Wenrui [2 ]
Zhang, Liang [3 ]
Zhang, Luoming [4 ]
Li, Chenglin [1 ]
Zou, Junni [2 ]
Xiong, Hongkai [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Elect Engn, Shanghai 200240, Peoples R China
[2] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China
[3] Donghua Univ, Sch Comp Sci & Technol, Shanghai 201620, Peoples R China
[4] Zhejiang Univ, Key Lab Biomed Engn, Minist Educ, Hangzhou 310027, Peoples R China
基金
中国国家自然科学基金;
关键词
Quantization (signal); Training; Perturbation methods; Memory management; Hardware; Trajectory; Random access memory; Graphics processing units; Computational modeling; Noise; Integerized training; deep neural network quantization; latent weight; dual quantizer; large language models;
D O I
10.1109/TPAMI.2025.3527498
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing methods for integerized training speed up deep learning by using low-bitwidth integerized weights, activations, gradients, and optimizer buffers. However, they overlook the issue of full-precision latent weights, which consume excessive memory to accumulate gradient-based updates for optimizing the integerized weights. In this paper, we propose the first latent weight quantization schema for general integerized training, which minimizes quantization perturbation to training process via residual quantization with optimized dual quantizer. We leverage residual quantization to eliminate the correlation between latent weight and integerized weight for suppressing quantization noise. We further propose dual quantizer with optimal nonuniform codebook to avoid frozen weight and ensure statistically unbiased training trajectory as full-precision latent weight. The codebook is optimized to minimize the disturbance on weight update under importance guidance and achieved with a three-segment polyline approximation for hardware-friendly implementation. Extensive experiments show that the proposed schema allows integerized training with lowest 4-bit latent weight for various architectures including ResNets, MobileNetV2, and Transformers, and yields negligible performance loss in image classification and text generation. Furthermore, we successfully fine-tune Large Language Models with up to 13 billion parameters on one single GPU using the proposed schema.
引用
收藏
页码:2816 / 2832
页数:17
相关论文
共 50 条
  • [41] Testing for Multiple Faults in Deep Neural Networks
    Moussa, Dina A.
    Hefenbrock, Michael
    Tahoori, Mehdi
    IEEE DESIGN & TEST, 2024, 41 (03) : 47 - 53
  • [42] Demystifying Learning Rate Policies for High Accuracy Training of Deep Neural Networks
    Wu, Yanzhao
    Liu, Ling
    Bae, Juhyun
    Chow, Ka-Ho
    Iyengar, Arun
    Pu, Calton
    Wei, Wenqi
    Yu, Lei
    Zhang, Qi
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 1971 - 1980
  • [43] Watermarking Deep Neural Networks in Image Processing
    Quan, Yuhui
    Teng, Huan
    Chen, Yixin
    Ji, Hui
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (05) : 1852 - 1865
  • [44] Iterative Deep Neural Network Quantization With Lipschitz Constraint
    Xu, Yuhui
    Dai, Wenrui
    Qi, Yingyong
    Zou, Junni
    Xiong, Hongkai
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (07) : 1874 - 1888
  • [45] Exploiting Deep Neural Networks as Covert Channels
    Pishbin, Hora Saadaat
    Bidgoly, Amir Jalaly
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2024, 21 (04) : 2115 - 2126
  • [46] POSITNN: TRAINING DEEP NEURAL NETWORKS WITH MIXED LOW-PRECISION POSIT
    Raposo, Goncalo
    Tomas, Pedro
    Roma, Nuno
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7908 - 7912
  • [47] Two-step approach in the training of regulated activation weight neural networks (RAWN)
    TeBraake, HAB
    VanCan, HJL
    VanStraten, G
    Verbruggen, HB
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 1997, 10 (02) : 157 - 170
  • [48] Design and Implementation of Tiny Deep Neural Networks for Landing Pad Detection on UAVs
    Ragusa, Edoardo
    Taccioli, Tommaso
    Canepa, Alessio
    Zunino, Rodolfo
    Gastaldo, Paolo
    IEEE ACCESS, 2024, 12 : 124009 - 124020
  • [49] Fuzzy training for neural networks
    Kermani, BG
    Schiffman, SS
    Nagle, HT
    8TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL IX, PROCEEDINGS: COMPUTER SCIENCE AND ENGINEERING: I, 2004, : 124 - 126
  • [50] Problems of Neural Networks Training
    Fofanov, Grigory A.
    2018 19TH INTERNATIONAL CONFERENCE OF YOUNG SPECIALISTS ON MICRO/NANOTECHNOLOGIES AND ELECTRON DEVICES (EDM 2018), 2018, : 142 - 144