Latent Weight Quantization for Integerized Training of Deep Neural Networks

被引:0
|
作者
Fei, Wen [1 ]
Dai, Wenrui [2 ]
Zhang, Liang [3 ]
Zhang, Luoming [4 ]
Li, Chenglin [1 ]
Zou, Junni [2 ]
Xiong, Hongkai [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Elect Engn, Shanghai 200240, Peoples R China
[2] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China
[3] Donghua Univ, Sch Comp Sci & Technol, Shanghai 201620, Peoples R China
[4] Zhejiang Univ, Key Lab Biomed Engn, Minist Educ, Hangzhou 310027, Peoples R China
基金
中国国家自然科学基金;
关键词
Quantization (signal); Training; Perturbation methods; Memory management; Hardware; Trajectory; Random access memory; Graphics processing units; Computational modeling; Noise; Integerized training; deep neural network quantization; latent weight; dual quantizer; large language models;
D O I
10.1109/TPAMI.2025.3527498
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing methods for integerized training speed up deep learning by using low-bitwidth integerized weights, activations, gradients, and optimizer buffers. However, they overlook the issue of full-precision latent weights, which consume excessive memory to accumulate gradient-based updates for optimizing the integerized weights. In this paper, we propose the first latent weight quantization schema for general integerized training, which minimizes quantization perturbation to training process via residual quantization with optimized dual quantizer. We leverage residual quantization to eliminate the correlation between latent weight and integerized weight for suppressing quantization noise. We further propose dual quantizer with optimal nonuniform codebook to avoid frozen weight and ensure statistically unbiased training trajectory as full-precision latent weight. The codebook is optimized to minimize the disturbance on weight update under importance guidance and achieved with a three-segment polyline approximation for hardware-friendly implementation. Extensive experiments show that the proposed schema allows integerized training with lowest 4-bit latent weight for various architectures including ResNets, MobileNetV2, and Transformers, and yields negligible performance loss in image classification and text generation. Furthermore, we successfully fine-tune Large Language Models with up to 13 billion parameters on one single GPU using the proposed schema.
引用
收藏
页码:2816 / 2832
页数:17
相关论文
共 50 条
  • [1] Scalable K-FAC Training for Deep Neural Networks With Distributed Preconditioning
    Zhang, Lin
    Shi, Shaohuai
    Wang, Wei
    Li, Bo
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2023, 11 (03) : 2365 - 2378
  • [2] Value-Aware Quantization for Training and Inference of Neural Networks
    Park, Eunhyeok
    Yoo, Sungjoo
    Vajda, Peter
    COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 608 - 624
  • [3] A Gradient Boosting Approach for Training Convolutional and Deep Neural Networks
    Emami, Seyedsaman
    Martinez-Munoz, Gonzalo
    IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2023, 4 : 313 - 321
  • [4] Training Deep Photonic Convolutional Neural Networks With Sinusoidal Activations
    Passalis, Nikolaos
    Mourgias-Alexandris, George
    Tsakyridis, Apostolos
    Pleros, Nikos
    Tefas, Anastasios
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2021, 5 (03): : 384 - 393
  • [5] Elastic Significant Bit Quantization and Acceleration for Deep Neural Networks
    Gong, Cheng
    Lu, Ye
    Xie, Kunpeng
    Jin, Zongming
    Li, Tao
    Wang, Yanzhi
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (11) : 3178 - 3193
  • [6] Training of Mixed-Signal Optical Convolutional Neural Networks With Reduced Quantization Levels
    Zhu, Zheyuan
    Ulseth, Joseph
    Li, Guifang
    Pang, Shuo
    IEEE ACCESS, 2021, 9 : 56645 - 56652
  • [7] Diffense: Defense Against Backdoor Attacks on Deep Neural Networks With Latent Diffusion
    Hu, Bowen
    Chang, Chip-Hong
    IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2024, 14 (04) : 729 - 742
  • [8] Cyclic Training of Dual Deep Neural Networks for Discovering User and Item Latent Traits in Recommendation Systems
    Rim, Dohyoung
    Nuriev, Sirojiddin
    Hong, Younggi
    IEEE ACCESS, 2025, 13 : 10663 - 10677
  • [9] Weight Update Skipping: Reducing Training Time for Artificial Neural Networks
    Safayenikoo, Pooneh
    Akturk, Ismail
    IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2021, 11 (04) : 563 - 574
  • [10] Understanding How Orthogonality of Parameters Improves Quantization of Neural Networks
    Eryilmaz, Sukru Burc
    Dundar, Aysegul
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (12) : 10737 - 10746